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(54) Pipelined parallel processor for image processing 

(57) A pipelined parallel processor (PPP) integrated 
circuit includes a filtering unit, internal routing circuitry 
such as a crosspoint switch, an internal frame store 
controller, and one or more function circuits. The func- 
tion circuits include, for example, arithmetic units, 
lookup tables, timing compensators, adders/subtrac- 
tors, statistics modules, image shifting circuitry, and 
other useful processing devices. The components of the 
PPP are interconnected with the crosspoint switch that 
routes data among the frame store controller, filtering 
unit, function circuits, external input channels, and 
external output channels. The internal frame store con- 
troller includes a plurality of programmable video line 
store memories that are coupled to an external field or 
frame store memory. Each line store memory may be 
programmed to provide data to, or receive data from 
one of the PPP components by a controller and to trans- 
fer the data from or to the memory, respectively. Access 
to the external memory by the line store memories is 
controlled by an arbitration processor which implements 
a priority scheme. 
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Description 

Field of the Invention 

5 [0001] This invention relates to a method and apparatus of processing image data. An illustrative embodiment of 
the invention relates to a method and apparatus for performing image processing using pipelined image processing 
operations combined together on a single integrated circuit. 

Background of the Invention 

70 

[0002] Pipelined image processing operations involve the connection of discrete storage and discrete pipelined 
image processing components which perform image operations. These image processing operations include, for exam- 
ple, image convolution, image warping, nonlinear image processing operations, and other specialized processes such 
as connected component analysis of binary images. These systems do not integrate the processing and storage cir- 
75 cuitry into single devices. Typically, these components are connected together with external crosspoint switches or ded- 
icated data flow routing. The use of discrete image storage and processing components results in large circuits that are 
difficult to fit on small printed circuit boards. In addition, the use of discrete components and longer circuit paths reduces 
the execution speeds of the processes. 

[0003] Even discrete processing components, such as pipelined image convolution units, require additional exter- 
20 nal circuitry during implementation. For example, an image convolver chip typically has external routing to and from the 
chip and external image line delay elements and pixel delay elements. The line and pixel delay elements delay image 
data so that simultaneous operations can be performed on a pixel neighborhood. The routing and line delay circuitry 
increases the amount of circuit board area and reduces execution speeds. 

[0004] Dedicated pipelined image processing components can provide very high performance for image process- 
es ing operations at a reduced cost to performance ratio over general-purpose processors such as von Neumann and Har- 
vard architectures. General-purpose processors typically cannot provide fundamental image operations with the same 
performance as pipelined architectures because general-purpose processors are limited to a data read-process-store 
operation for each pixel operation. Although performance of the general-purpose processors has improved with tech- 
nological advances in caching and other memory management concepts, these processors are typically not suited for 
30 image operations due to the large amount of data that is processed. 

[0005] Massively parallel processors and computers can provide very fast performance in comparison to general 
purpose processors that use von Neumann architectures, and can match the processing speeds of pipelined hardware 
in some instances. These devices, however, are typically very complex to program and expensive to implement. In addi- 
tion, the process of providing image data to each of the processors in a timely manner before parallel execution begins 
35 is generally slow. The parallel processors first load all of the information into the processors, then execute the image 
processing operations, and then read the processed data out of the processors to external devices that will perform fur- 
ther operations on the image data. Although the parallel processors will execute the process step at rates that are typ- 
ically much faster than their von Neumann counterparts, the parallel processors are typically limited by the read- 
process- store cycle. 

40 [0006] Further, certain image processing operations are difficult to implement with parallel architectures. While 
image convolution and other filtering operations that use relatively small pixel neighborhoods can be implemented effi- 
ciently in parallel systems, operations such as image warping are still quite inefficient. 

[0007] Image processing applications are typically separated into two components: fundamental, i.e. - front-end, 
image processing operations, such as filtering, feature extraction, image alignment, and arithmetic operations, and 

45 higher-level processes that operate on the processed image data from the front-end processing to fulfill the require- 
ments of a particular application. In efficient implementations, the front-end processing is typically performed with ded- 
icated hardware that can provide processing at a very high performance to cost ratio. The higher-level processing of the 
image data is typically performed with general-purpose processors because of their flexibility. Front-end processes typ- 
ically incorporate the combination of many discrete steps, however, which makes the implementation of these proc- 

50 esses in dedicated hardware slow and complicated. Aspects of the invention are specified in the claims. The features 
of the sub-claims may be combined with other features of the claims in combination additional to those explicitly set out 
in the claims. 

Summary of the Invention 

55 

[0008] ; An embodiment of the present invention provides a pipelined parallel processor (PPP) integrated circuit. The 
PPP includes a filtering unit, internal routing circuitry such as a crosspoint switch, an internal frame store controller, and 
one or more function circuits. The function circuits may be, for example, arithmetic units, lookup tables, timing compen- 
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sators, adders/subtracters, statistics modules, image shifting circuitry, and other useful processing devices. The com- 
ponents of the PPP are interconnected by the crosspoint switch which routes data between the frame store controller, 
the filtering unit, function circuits, external input channels, and external output channels. The internal frame store con- 
troller includes a plurality of programmable bi-directional buffer memories that are coupled to an external field or frame 
5 store memory. Each buffer memory is assigned to provide data to, or receive data from one of the PPP components by 
a controller. 

Brief Description of the Drawing 

io [0009] The invention is better understood from the following illustrative description when read in connection with the 
accompanying drawing, in which: 

Fig. 1 is a block diagram of an embodiment of the present invention directed to a pipelined parallel processor inte- 
grated on to a single integrated circuit. 
75 Figs. 2(a)-2(d) are data structure diagrams of frames of an image signal illustrating the conceptual operation of an 

image stabilization process. 

Fig. 3 is a flow chart diagram illustrating the image stabilization process of the frames shown in Figs. 2(a)-2(d) in 
accordance with an embodiment of the present invention. 

Fig. 4 is data structure diagram illustrating the conceptual operation of the image stabilization process shown in Fig. 
20 3. 

Fig. 5 is a block diagram of an embodiment of the present invention as a pipelined parallel processor integrated 
onto a single integrated circuit within an image stabilization system. 

Fig. 6 is a block diagram of a PPP 560 and memory 550 shown in Fig. 5 in accordance with a first embodiment of 
the present invention. 

25 Fig. 7 is a timing diagram useful for explaining the operation of the PPP of Fig. 6. 

Fig. 8 is a data flow diagram illustrating the operation of the image stabilization system shown in Fig 6 in accord- 
ance with an embodiment of the present invention. 

Fig. 9 is a block diagram of the frame store controllers 605 and 610 shown in Fig. 6 in accordance with an embod- 
iment of the present invention. 

30 Figs. 1 0-1 2 are data structure diagrams illustrating the read region of interest RD_ROI in memories 550a and 550b 

shown in Fig. 6. 

Figs. 13-19 are timing diagrams useful for explaining the read operation of frame store controller 605 and 610 
shown in Fig. 6. 

Figs. 20 and 21 are data structure diagrams illustrating the region of interest WR_ROI in memories 550a and 550b 
35 shown in Fig. 6. 

Fig. 22 is timing diagram useful for explaining the write operation of frame store controllers 605 and 61 0 shown in 
Fig. 6. 

Fig. 23 is a block diagram of the interface circuitry 900 of the frame store controllers 605 and 610 shown in Fig. 9 
in accordance with an embodiment of the present invention. 
40 Fig. 24 is timing diagram useful for explaining the operation of interface circuitry shown in Fig. 23. 

Fig. 25 is block diagram of a parallel pipelined processor and memory in accordance with a second embodiment of 
the present invention. 

Fig. 26 is a block diagram of a multi-port frame store controller suitable for use with the PPP shown in Fig. 25. 
Fig. 27 is a block diagram which illustrates a possible connection of the circuit elements shown in the PPP of Fig. 
45 25 to implement a pyramid filtering function. 

Fig. 28 is a timing diagram which is useful for describing an image alignment operation performed using the PPP 
shown in Fig. 25. 

Detailed Description of the Invention 

so 

[0010] The following description uses the term "exemplary" means "an example". 
Overview 

55 [001 1] The illustrative embodiment of the invention is directed to a Pipelined Pyramid Processor (PPP) which inte- 
grates the most fundamental storage and processing aspects of the front-end image processing operations into a single 
ASIC device. The fundamental storage and processing aspects are implemented by a PPP having a filtering unit, inter- 
nal routing circuitry, an internal frame store controller, and one or more function circuits. Additional circuitry may be 
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added to the PPP to implement particular processing operations including arithmetic logic unit (ALU.) operations and 
digital signal processing core operations. In addition, the components of the PPP may be programmed to perform a 
variety of processing operations. In either case, the PPP provides a flexible architecture that may be added to and/or 
programmed to perform a variety of image processing operations. The PPP also provides an architecture from which 
ASICs may be developed and produced. The filtering unit, internal routing circuitry,, and internal frame store controller 
provide' a standardized architecture and interface for the addition and integration of other components. The amount of 
board space required to implement front-end image processing systems is greatly reduce by integrating many compo- 
nents into a single device. In addition, the effective execution time of the image processing operations is decreased 
because on-chip processing is typically performed at higher rates than processing that requires communications 
among multiple discrete processing devices; in addition, these pipelined operations may overlap. Finally, the processing 
systems can be implemented rapidly and with minimal integration because of the standardized architecture and inter- 
faces. 

Overview of First Exemplary Implementation of PPP 

[0012] A first exemplary implementation of the PPP is described below with reference to Fig. 1. The PPP includes 
processing circuitry which is implemented on a single integrated circuit (IC) 5. The circuitry includes a filtering unit 10, 
internal routing circuitry such as a crosspoint switch 40, an internal frame store controller 20, and one or more function 
circuits 30. The function circuits 30 can be arithmetic units, lookup tables, timing compensators, adders/subtractors, 
statistics modules, image shifting circuitry, and other useful processing devices. The devices are interconnected with 
the crosspoint switch 40 which routes image information between the frame store controller 20, filtering unit 10, function 
circuits 30, external input terminals 60 and 65, and external output terminals 70 and 75. The circuitry is configured to 
accept input signals having imbedded timing signals. These timing signals are propagated through the circuitry with 
compe nsating delays to match the delays of the data signals. These propagated timing signals are imbedded in the out- 
put signals. 

[0013] The crosspoint switch 40 is a centralized non-blocking crosspoint switch. Crosspoint switch 40 encodes 
channels of image data along with timing information to enable multiple channels of image information to be processed 
without interference. Crosspoint switch 40 is used to route image data between frame store controller 20, filtering unit 
10, function circuits 30, input channels 60 and 65 and output channels 70 and 75. 

[0014] The input data signals and the output data signals of frame store controller-20 are connected directly to 
crosspoint switch 40 which routes the image data between frame store controller 20 and the other circuitry connected 
to crosspoint switch 40. The frame store controller used in this embodiment of the invention resamples image data sig- 
nals during image read and store operations and executes read and store operations simultaneously. The frame store 
controller 20 may also include an access port (not shown) to allow data stored in the memory 50 to be accessed by the 
control processor 100 through a global data bus 90 for processor read and store operations without passing through 
crosspoint switch 40. 

[001 5] Filter unit 1 0 may be a two-dimensional filter having a separable kernel that can be treated as a combination 
of separate horizontal and vertical filters. The five vertical and five horizontal filter coefficients of the exemplary filter unit 
10 are programmable within a limited set of values and may be either symmetric or antisymmetric. In addition, the filter 
unit 1 0 may be configured to have either an odd or an even number of taps. The filter unit 1 0 combined with the frame 
store controller 20 may be used for many types of image filtering applications. These include the efficient generation of 
multiresolution representations of imagery (i.e. image pyramids) as described in U.S. Patent No. 5,359,674, entitled 
PYRAMID PROCESSOR INTEGRATED CIRCUIT, issued to van der Wal, which is incorporation herein by reference. 
Multiresolution (pyramid) representations in image processing are used for efficient image processing operations 
including, for example, image feature analysis, image compression, image registration (alignment), and motion estima- 
tion. These applications use the resampling operation of frame store controller 20, described below, for image decima- 
tion and expansion operations that are a part of pyramid generation and reconstruction. 

[0016] Function circuits 30 may implement a variety of different image processing functions. These functions 
include (1) fundamental arithmetic operations at differing levels of precision, (2) arbitrary pointwise single image oper- 
ations, (3) arbitrary pointwise dual image operations, (4) image statistical analysis, and (5) nonlinear image filtering. 
These functions are described below. 

[0017] Fundamental arithmetic operations are used in many image operations. These include the detection of 
changes in image intensity and image motion estimation. These operations use image addition, subtraction, multiplica- 
tion, and division. 

[0018] Arbitrary pointwise single image operations are used to process image data. Both linear and nonlinear func- 
tions of pixel values may be used during image processing to provide useful processing results. These operations 
include adding gain to images, implementing image thresholding operations, and inverting images. A programmable 
look-up table is typically used to implement these operations in a flexible manner. 
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[00T9] Many operations in front-end processing use arbitrary pointwise operations on two images. The arbitrary 
pointwise operations include image masking, image modulo arithmetic, (i.e. one image modulo with another image), 
and other image operations where the resulting image is the same precision as either of the input images. Operations 
such as image multiplication, sum-squared, and sum-absolute difference, may also be implemented using arbitrary 
5 pointwise operations if the output values are scaled to retain the same precision as the input values. Usually, a program- 
mable look-up table is used to implement these operations if sufficient memory is available either interna! or external to 
the PPP integrated circuit (e.g. 64k x 8). Alternatively, many of the functions may be implemented by circuitry directly in 
the PPP. , 

[0020] Image statistical analysis such as the generation of image histograms and accumulation over the entire 
w image or over a subregion of the image are used to accurately determine appropriate thresholds, to determine an 
amount of image energy in a given scene, and to interpret cross-correlation surfaces and other objective functions for 
motion estimation and image alignment. 

[0021] Nonlinear image filtering is used to perform spatial operations such as minimum, maximum, and median fil- 
tering which may be used to clean up and otherwise condition image data for further processing. These operations are 
75 typically expensive to implement using general-purpose processors, but are readily available using the PPP. One such 
device having functional circuits that may be incorporated into the PPP is Median/M ax/Mi n Filter, such as the filter part 
no. IP90c20 manufactured by Sumitomo Metals. 

[0022] The PPP incorporates a subset of these operations into a single integrated circuit with an internal crosspoint 
switch 40 for flexible on-chip routing of the video information through the various processing devices. This allows effi- 

20 cient implementations of many front-end processes, while integrating the components so they can be efficiently incor- 
porated into final processing architectures and devices with a minimal amount of circuit board space and cost. The 
performance of the PPP in implementing any of these functions is typically an order of magnitude or more better than 
the same processing being performed with general-purpose processors, and typically is at a fraction of the cost of mas- 
sively parallel processing systems. 

25 [0023] The function circuitry 30 may be, for example, a look-up table (LUT) having a programmable registered 
memory capable of performing arbitrary pointwise operations on a single image. The LUT may be used to increase the 
gain of image data, to apply nonlinear functions to the image data, to invert the image data, and to perform other single 
image operations. The functions of the LUT are programmed through global bus 90. Different functions of the LUT may 
be stored in memory which is integrated into the PPP to reduce setup time when programming the LUT. 

30 [0024] Using 0.5ji or the current state of the art 0.35ji CMOS designs, the minimal implementation PPP may be 
incorporated into a single ASIC. Such an ASIC runs at internal speeds of 60 MHz or higher. Using 0.35^ CMOS designs 
not only reduces the die size of the PPP, but increases the processing speeds and reduces power dissipation. 
[0025] The operation of the different components shown in Fig. 1 is described in greater detail below by illustrating 
exemplary applications of the PPP that perform image stabilization, pyramid filtering and image merging. Although the 

35 exemplary embodiments of the present invention is explained with reference to specific image processing applications, 
the PPP is applicable to many different image processing applications. 

Detailed Description Of The Exemplary Application 

40 Overview of Image Sta bilization 

[0026] Video captured by a hand held camera, or by a tripod-mounted camera with a large telephoto lens often 
exhibits some random image motion. Reducing the image motion is desirable for viewing such video. Image stabiliza- 
tion is a process that eliminates or reduces a component of the image motion in a video sequence. Accurate image sta- 

45 bilization greatly enhances the ability to analyze or further process video by means of a digital processor and, therefore, 
is considered an important "vision-f ront-end" image process. Image stabilization may be applied in several modes. One 
mode removes all image motion, which makes it appear as if the video camera was completely stationary. Another form 
of stabilization smoothes the image motion, by eliminating only fast image motion, but allowing for slow image motion, 
such as image panning, to occur. 

so [0027] Image stabilization can be achieved by determining the image motion between consecutive image frames, 
and then compensating for this image motion by warping, i.e. - transforming the current frame towards a previous frame 
so that its image conforms to the image of that previous frame. An example that shows several consecutive frames of 
an video image signal is shown in Fig. 2(a) where t indicates an instant in time. The video image signal includes a 
sequence of four video frames F(t), F(t+1), F(t+2), and F(t+3), where there is an apparent random motion of the image 

55 200 in the four frames. 

[0028] Fig. 2(b) shows the four frames F(t), F(t+1), F(t+2), and F(t+3) superimposed to illustrate the amount of 
image motion among the four frames. Fig. 2(c) illustrates the fourframes F(t), F(t+1), F(t+2), and F(t+3) after image sta- 
bilization according to the first method. The subsequent image frames F(t+1), F(t+2), and F(t+3) are aligned to the 
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frame F(t). Fig. 2(d) illustrates the tour frames F(t), F(t+1), F(t+2), and F(t+3) after image stabilization according to a 
second method where image stabilization is implemented by "smoothing" the image motion. There is a general motion 
of the image to the right in Fig. 2(d). 

[0029] Fig. 3 is a flow chart illustrating the stabilization method for stabilizing a video image signal. At step 300, the 
5 method determines the image motion between the current frame F(t) and a previous frame F(t-1). Next, at step 310, the 
alignment parameters Cz are generated to align the current image F(t) to the reference frame, or to an image position 
at which unwanted image motion is removed. In one exemplary embodiment of the invention described below, the align- 
ment parameters define a translation of the current image F(t) which aligns it with the previous image. It is contem- 
plated, however, that the alignment parameters may define a more general image transformation, such as an affine 
to transformation. At step 320, the current image is re-aligned, i.e. - warped, according to the alignment parameters Cz. 
The stabilization method of Fig. 3 is illustrated in Fig. 4. 

[0030] The motion between the current frame and the previous frame is determined by correlating the two images. 
As shown in Fig. 4, a set of image correlations are performed in a hierarchical fashion to determine the amount of image 
motion that has occurred between the current frame F(t) and the previous frame F(t-1). To determine the amount of 
15 image motion, an image pyramid is generated from the current frame F(t). For example, the Laplacian pyramid Lz(M) 
to Lz(M), for example L 0 (M) to L 4 (M), is generated for the previous frame F(M) and the Laplacian pyramid Lz(t) to 
Lz(t), for example L 0 (t) to L 4 (x), is generated for the current frame F(t). The correlation between the current frame F(t) 
and the previous frame F(M) is performed using a sixty four pixel value by sixty four pixel value region of the frames. 
[0031] For each level, the correlation values Cz(iJ) are generated in accordance with equation (1) below. 

20 

c nO>j)= EL L n(*+i> y+J.t).L n (x\y.t-1) (1) 
* y 

25 

where x and y are coordinates of the pixels values in the nth level of the Laplacian Image pyramid and i and j are inte- 
gers and -2 < i < 2 and -2 < j < 2. For example, when there are four levels of the Laplacian pyramid, z = 1, 2, 3, or 4. 
Twenty five correlation values Cz(i,j) may be produced at pyramid level z = 4 because pyramid level four includes sixteen 
pixel values corresponding to the pixels of the frames. In other words, a five pixel value by five pixel value region corre- 
30 lation at level four represents a total correlation search space of 64-by-64 pixel values in -each of the current frame F(t) 
and the previous frame F(t-1). 

[0032] A simple maximum search is performed on the twenty five correlation values Cz(ij) to determine the location 
of the maximum correlation value Cz(i,j)max. Once the maximum correlation value has been determined, the correla- 
tion values Cz(ij) adjacent to and including the maximum correlation value Cz(i,j)max are interpolated to estimate the 
35 maximum peak correlation value Cz(i,j)peak of the correlation values Cz(ij) to sub-pixel accuracy. The peak correlation 
value Cz(i,j)peak provides an estimate of the image motion, relative to the image resolution of the pyramid level at which 
the correlation values Cz(ij) are generated. At level z = 4, the estimate is coarse but motions are stimulated over a large 
range. 

[0033] Next, the peak correlation value Cz(i,j)peak for the next higher resolution level z = 3 is produced. Prior to 
40 generating correlation values between previous Laplacian image Lz(t-1) and the current Laplacian image Lz(t), the pre- 
vious Laplacian image Lz(t-1) is first warped with the course motion estimate, peak correlation value C4(ij)peak, 
obtained from the correlation at level four, z = 4. Thus, the correlation computation at level three, z = 3, is a refinement 
of the previous motion estimate. At level three, the expected residual motion is less than ± 1 pixel value per line. A three- 
by-three value region in the Laplacian image at level three, z = 3, is used to generate the correlation values C3(i,j). 
45 [0034] The image motion estimate is refined again by repeating this process at level 2. Typically, the motion esti- 
mate obtained at this resolution is better than one tenth the pixel resolution of level zero, z = 0, if the image motion was 
a pure translation. The peak correlation values Cz(i,j)peak are added and used to warp the current frame F(T) and sta- 
bilize the image. 

so Detailed description of the Image Stabilization System 

[0035] The image stabilization system is described below with reference to Figs. 5-7. Fig. 5 is a block diagram of 
the image stabilization system according to an exemplary embodiment of the present invention. A color video signal is 
provided to input terminal IN of the video stabilization system 500. The exemplary color video signal is an NTSC video 
55 signal digitized to component digital image format 4:2:2. Alternatively, the color video signal may be a standard compo- 
nent digital format such as CCIR601 , D1 format. The D1 format has 1440 pixels per line, where the luminance compo- 
nent signal Y has 720 pixel values and the chrominance component signal has two chrominance component signals 
each including 320 pixels interleaved into a single data stream. Another possible input may be an MPEG encoded sig- 
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nal that is decoded into linear scan image format. 

[0036] The color video signal is provided to input video decoder 510 which is, for example, a Philips Part No. 

SAA7110. The input video decoder 510 digitizes the color video signal and generates two digital video output signal 

streams: the luminance component signal Y and the chrominance component signals Cu and Cv, in 4:2:2 format. The 
5 luminance component signal Y is represented at full digitized resolution (i.e. 640 active pixel values per line) and the 

chrominance component signals Cv and Cu are subsampled by a factor of two in the horizontal direction (i.e. 320 active 

pixel values per line). The chrominance component signals Cv and Cu are interleaved to form 640 pixel values per line. 

Although it is shown as a separate unit, it is contemplated that the video decoder 51 0 may be integrated in the PPP 560. 

[0037] Table 1 below illustrates the two data streams, where Cu and Cv are the two color components, Y is the lumi- 
w nance component, and the index is the pixel value number on the image line. 



TABLE 1 
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Luminance Signal 


Y0 


Y1 


Y2 


Y3 


Y4 


Y5 


Y6 


Y7 


Y8 


Y9 


Chrominance Signal 


CuO 


CvO 


Cu2 


Cv2 


Cu4 


Cv4 


Cu5 


Cv5 


Cu6 


Cv6. 



[0038] The luminance component signal Y is provided to frame buffer 530 and the chrominance component signal 
is provided to frame buffer 520. The luminance and chrominance component signals are provided to these respective 
20 frame buffers at a clock rate at 12.5 MHz to 13.5 MHz. Each frame buffer 520 and 530 is double buffered so that lumi- 
nance component signal Y and the interleaved chrominance component signals Cu and Cv may be stored in one buffer 
at a clock rate of 13.5 MHz and read from the other buffer at a clock rate of 60 MHz. An example of frame buffers 520 
and 530 is Hitachi part no. HM530281, which can store 332k pixel values of data. 

[0039] The luminance component signal Y and the chrominance component signals Cu and Cv are provided to 
25 Pipelined Parallel Processor (PPP) 560 which is a single integrated circuit. PPP 560 implements the stabilization proc- 
ess described above. PPP 560 is described below with reference to Rg. 6. PPP 560 includes crosspoint switch circuitry 
640, filter units 615 and 620, two delays 625a and 625b coupled respectively to multipliers 630a and 630b that are, in 
turn, coupled to respective accumulators 635a and 635b. The PPP 560 also includes two frame store controllers 605 
and 610 which are respectively coupled to memories 550a and 550b that form memory 550, shown in Fig. 5, and a look- 
so up table (LUT) 645. 

[0040] The crosspoint switch 640 may be the same as crosspoint switch 40, shown in Fig. 1. the luminance com- 
ponent signal Y is provided to Crosspoint switch 640 via terminal INI and the chrominance component signals are pro- 
vided via terminal IN2. The luminance and chrominance component signals are multibit digital signals each containing 
at least eight data bits and two timing bits. The two timing bits convey respective timing signals. One signal, HA, is in a 

35 logic high state when the data in a line is valid (i.e. during the active picture interval) and in a logic low state otherwise 
(i.e. during the horizontal blanking interval). The other signal, VA, is in a logic high state when the data in a frame is valid 
and in a logic low state otherwise (i.e. during the vertical blanking interval). Fig. 7 is a timing diagram that illustrates the 
relative timing of the signal provided to terminal INI and the signal provided from output terminal OUT1 , as well as the 
signals HA and VA for those signals. The exemplary operation shown in this timing diagram delays the output signal by 

40 approximately two line intervals with respect to the input signal. 

[0041] The PPP 560 uses the timing signals HA and VA in the same manner as described in the above-referenced 
U.S. Patent No. 5,359,674. The timing signals HA and VA remain associated with the input image signal as the input 
image signal is processed by the stabilization system 500. As a result, when the stabilized image signal is produced, 
the vertical and horizontal blanking periods of the stabilized image signal do not need to be generated. The relative tim- 

45 ing of the stabilized image signal is maintained by associating the timing signals HA and VA with the color video signal 
as it is processed. Thus, less circuitry is required. 

[0042] The two delay elements 625a and 625b may serve several functions. The delay elements 625a and 625b 
provide a programmed delay operation to delay one digital image data stream relative to another digital image data 
stream to enable the computation of a cross correlation or other two- ope rand functions that requires a shift location in 

so the horizontal or vertical direction between two images. The delays 61 5a and 615b also provide automatic alignment in 
time of two image data streams. For example, the delays 615a and 615b may align two image data streams in time if 
the two image data streams are to be combined in a two-operand function using, for example, multiplier 630a. The two 
image data streams may arrive at the delay elements misaligned in time because they have encountered different 
delays through the device. For example, the image data streams may not be aligned if one image data stream is passed 

55 through LUT 645 and the other image data stream is not By monitoring the timing signals HA and VA associated with 
each image data stream the delay elements 615a and 615b may automatically detect the time difference between the 
two image data streams and delay one image data stream such that the two image data streams are aligned in time 
when presented to the two-operand function. The programmed delay operation may be combined with the automatic 
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alignment operation. 

[0043] The exemplary embodiment is not limited to processing multibit digital signals containing eight data bits and 
two timing bits. The exemplary embodiment may be configured to receive multibit digital signals containing, for example, 
10 data bits and two timing bits. 

5 [0044] Referring to Fig. 6, crosspoint switch 640 enables signals provided via input terminals INI and IN2 to be con- 
nected to any of the output terminals OUT1 and OUT2, without interfering or blocking any of the other channels within 
the crosspoint switch 640 for maximum parallelism and interconnectivity. The routing performed by the crosspoint 
switch 640 is programmable from an external controller 595, shown in Fig. 5, through global data bus 645. Exemplary 
circuitry which can be used for crosspoint switch 640 and external controller 595 is the LSI Logic crosspoint switch part 

io number L64270 QC-40. 

[0045] Fig. 8 is data flow diagram which illustrates the operation of the video stabilization system 500 and, particu- 
larly, the PPP 560, shown in Fig. 5. In general, as is shown in Fig. 8, a number of different processes are implemented 
by the PPP 560 during one cycle to stabilize the image. 

[0046] In process block 81 0, a Gaussian image Gl (t)Y is generated from a luminance component signal Y provided 
75 to the PPP 560 from frame buffer 530, shown in Fig. 5. In process block 815, the Gaussian image Gl (t)Y is used to pro- 
duce lower resolution Gaussian images Gn+1(t) and the Laplacian image pyramid, shown in Fig. 4. Each time process 
block 815 is repeated, one of the Laplacian images L0(t) through L4(t), shown in Fig. 4, is produced. Although process 
block 815 is repeated, the time required to produce the .Laplacian image pyramid and the lower resolution Gaussian 
images is less than the amount of time required to produce the Gaussian image G1(t)Y in process block 810. This is 
20 because the number of pixel values in the Laplacian and Gaussian images are reduced by, for example, one fourth for 
each level of the Gaussian and Laplacian images. Thus, there are fewer pixel values to be processed by process block 
815. 

[0047] In process block 820, after the Laplacian pyramid has been produced, the Laplacian images L0(t) through 
L4(t) and the Laplacian images L0(t-1) through L4(t-1) produced during a previous cycle, are used to generate the align- 

25 ment parameters C(x, y) as described above with reference to Fig. 4. In process block 825 during the next cycle, the 
alignment parameters C(x, y) are used to warp the luminance component signal Y of the input image frame (II F) from 
the previous cycle. In process block 830, the alignment parameters C(x, y) are used to warp the chrominance compo- 
nent signals Cu and Cv of the II F from the previous cycle. In process block 835, the chrominance component signal of 
the next frame is provided to the PPP 560, shown in Fig. 5, frame buffer 520. Process blocks 810 through 835 align the 

30 frames of an input image signal. The stabilization process is described in greater detail 'below. 

[0048] The stabilization process performed by PPP is divided into three steps, where each step consists of two or 
more parallel processes. The steps are: (1) generate first Gaussian level, section 800a, (2) generate Laplacian pyramid, 
section 800b, and (3) perform hierarchical image correlations, section 800c. The total time required for the PPP 560 to 
complete all three steps is about 14 msec assuming a 50 MHz clock. The stabilization process is performed by the PPP 

35 560, and consists of a set of operations performed in parallel and in series, all within one field time. If the video format 
is a 30 Hz video format (e.g NTSC), the field time is about 16.7 msec. 

[0049] As is shown in Figs. 6 and 8, the luminance component signal Y is provided from buffer 530, shown in Fig. 
5, to crosspoint switch 640, shown in Fig. 6, which routes the luminance component signal Y to frame store controller 
61 0 and filter unit 615. Filter unit 615 is a pyramid filter unit and is the same as the filter unit 10, shown in Fig. 1. As is 

40 shown in Fig. 1 , line delay and border control (hereinafter delay\border control) circuitry 1 1 1 provides a four or eight-line 
delay for the vertical portion of the two-dimensional filter. The delayed lines are processed both vertically and horizon- 
tally in delay\border control circuitry 111. Filter circuitry 110 and line delays and border control circuitry 111 are, for 
example, the filter 110, memory 1 1 4, address Counter 1 1 6, and timing circuitry 128, described in the above referenced 
patent No. 5,359,674, issued to van der Wal. 

45 [0050] The delay/border control circuitry 1 1 1 ensures that border effects are minimized in the filter unit 61 5 by edge 
extending, minoring, or blanking out the pixel values outside of the region of the image before filtering. The delay/border 
control circuitry 111 can also be used to provide spread-tap (resampled) filtering, and can be used to zero out interme- 
diate pixels in the image for accurate image interpolation operations. 

[0051] The output signal provided by filter circuitry 1 1 0 is processed by clip processing circuitry 112 that performs 
so rounding and scaling for single precision signals and combines the filtered data as the more significant bit (MSB) posi- 
tions with filtered data representing the less significant bit (LSB) positions to generate double-precision output signals. 
Clip processing circuitry 112 corresponds to, for example, clip processing circuitry 124 described in the above-refer- 
enced U.S. Patent No. 5,359,674. 

[0052] The signal provided by multiplexer circuitry 115 may be either the input signal INF2 or one of the delayed 
55 horizontal line signals provided by the delay/border control 111. Multiplexer circuitry 1 1 5 is coupled to internal compen- 
sating delays (not shown) in the delay/border control 111. The internal compensating delays are used to delay the input 
signal INF1 from one to eight lines intervals. Multiplexing circuitry 115 selects among the input signal INF1 delayed by 
one to eight line intervals or input signal INF2. 
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[0053] The output signal of multiplexer 1 1 5 and clip processing circuitry 1 1 2 can be processed by ALU 1 1 4 to pro- 
duce output signal OUTF1. ALU 114 operates the same as ALU 113 of the above referenced patent No. 5,359,674. ALU 
1 14 performs operations between the filtered image and the nonfiltered image, or can pass one of the images without 
modification. Filter unit 615 has two parallel paths which may be used to simultaneously calculate a Gaussian low-pass 

5 filtered image and a Laplacian function (the difference of the input image and the Gaussian) of the input image. As is 
shown in Fig. 1, the Laplacian image may be generated by the ALU 114 which receives the Gaussian image from clip 
circuitry 1 1 2 and a delayed input image from delay/border control 1 1 1 and multiplexer 1 1 5. The Gaussian and Laplacian 
images are used during image stabilization as described below. The generation of Gaussian and Laplacian images 
using filter unit 615 is described in the above referenced patent No. 5,359,674. 
10 [0054] Returning to Fig. 8, in section 800a, the luminance component signal is provided to filter unit 61 5 which gen- 
erates a Gaussian image signal G1 (t)Y of the luminance component signal which is provided via crosspoint switch cir- 
cuitry 640 to frame store controller 605. The frame store controller 605 subsamples and stores the Gaussian image 
signal G1 (t)Y in memory 550a. The subsampling and storing operations of the frame store controller 605 are described 
below. The input luminance component signal Y (hereinafter Gaussian image signal G0(t)Y) is provided to frame store 

75 controller 61 0 which stores the Gaussian image signal G0(t)Y. Frame store controllers 605 and 61 0 operate in the same 
manner and are described below after the description of the operation of the stabilization system 500. In parallel, the 
luminance component of the previous image G0(t-1)Y stored in memory 550b is retrieved using frame store controller 
610 which generates an intermediate warped image. The intermediate warped image is further warped using filter unit 
620. Filter unit 620 may be a separable five by five filter which includes, for example, 10 multipliers. 

20 [0055] The previous image G0(t-1)Y is warped using alignment parameters C(ij) produced in the previous frame 
interval by section 800c, shown in Fig. 8. The peak alignment parameters C(i,j)peak from the previous frame interval 
are provided to microprocessor 540. As is described below, process block 820 is repeated to produce the peak align- 
ment parameters C(i,j)peak between the Laplacian images Ln(M) and Ln(t). The peak alignment parameters C(i,j)peak 
for each level further refine the warping process to stabilize the image. Microprocessor 540 accumulates the peak align- 

25 ment parameters C(i,j)peak generated for each level of the Laplacian pyramid from the most significant bit position to 
the less significant bit position. In other words, microprocessor 540 produces an accumulated peak alignment parame- 
ter C(i,j)peak from the alignment parameters produced at each level of the Laplacian pyramid. Microprocessor 540 uses 
the accumulated alignment parameter to program filter unit 620 and frame store controller 610 to warp the luminance 
component signal of the input image G0(t-1)Y as described below. 

30 [0056] The generation of the alignment parameters is described below. The alignment parameters include an inte- 
ger value and a fractional value. An intermediate warped image can be generated using the integer part C(xi, yi) of the 
alignment parameters C(x, y). The integer part C(xi, yi) of the alignment parameters C(i,j) includes a horizontal compo- 
nent xi and a vertical component yi which specify a translation of the previous image G0(t-1)Y in the horizontal and ver- 
tical directions, respectively. The frame store controller 610 retrieves pixel values from memory 550b that are included 

35 in a region of interest RD_ROI, shown in Figs. 10-11, and described below. The region of interest RD_ROI is a rectan- 
gular region in memory 550b, shown in Fig. 6. When pixel values are retrieved from memory 550b using frame store 
controller 61 0, shown in Fig. 6, the size and starting coordinates of the read region of interest RD_ROI in memory 550b 
are specified. 

[0057] Frame store controller 605 warps the previous image G0(t-1)Y by shifting the read region of interest 
40 RD_ROI, shown in Fig. 1 0, in response to the horizontal component xi and vertical component yi of the integer part of 
the alignment parameters C(i,j). For example, assume the region of interest RD_ROI to be retrieved from memory 550b 
has starting coordinate values (RD_HZERO= 2, RD_VZERO = 8) where data value RDJHZERO, shown in Fig. 10, is 
the horizontal starting coordinate of the read region of interest RD_ROI and data value RD_VZERO, shown in Fig. 10, 
is the vertical starting coordinate of the read region of interest RD_ROI. If the horizontal component xi is four and the 
45 vertical component yi is five, the starting coordinates are adjusted to (RD_HZERO =6, RD_VZERO = 13). As a result, 
image data which is retrieved from memory 550b by frame store controller 61 0 is warped by the integer part of the align- 
ment values C(i,j). 

[0058] Next, filter unit 620, interpolates the intermediate warped image using the fractional part of the peak align- 
ment parameter C(i,j)peak to complete the warping of the image. Microprocessor 540, shown in Fig. 5, uses the peak 

so alignment parameter C(i j)peak to program the filter unit 620, shown in Fig. 6, to warp the luminance component of the 
image. The peak alignment parameter C(i,j)peak specifies the degree to which the image is warped. 
[0059] Each of the pixel values in the image are warped by the peak alignment parameter C(i,j)peak to align the 
images in the frames as is shown in Figs. 2(a)-2(d). Thus, it is desirable to warp each pixel value in the frame using the 
same fractional translation. Therefore, filter unit 620 may be programmed to implement a two-by-two bilinear interpola- 

55 tion operation. Alternatively, filter unit 620 may be a two-by-two separable filter. One such bilinear interpolation filter is 
TRW part number TMC2301 described in Real-time Bilinear Interpolation Using the TMC2241 by Steve Gomez, TRW 
LSI Products Division, and dated January 21 , 1 989, which is incorporated herein by reference for its teachings on bilin- 
ear interpolation. 
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[0060] • For example, if the fractional part C(xf, yf) of the alignment values is 0.3 pixel values in the horizontal direc- 
tion, and 0.4 pixel values in the vertical direction, then a two tap horizontal filter with coefficient values 0.7 and 0.3, and 
a two tap vertical fitter with coefficients 0.6 and 0.4 may be used to implement the bilinear interpolation. By using more 
than two taps in the horizontal and vertical sections, the filter unit 620 may implement higher order interpolation opera- 

5 tions to increase the quality of the image interpolation. 

[0061] Returning to Fig. 8. once section 800a is complete, the steps shown in section 800b are performed. The 
Gaussian image Gn(t)Y is retrieved from memory 550a using frame store controller 605. The Gaussian image Gn(t)Y 
is filtered using filter unit 615 to produce the next level Gaussian image Gn+1(t)Y. For example, if Gaussian image 
G1(t)Y is retrieved from frame store controller 605, filter unit 615 produces Gaussian image G2(t)Y Filter unit 615 per- 

70 forms the same operations as described above with reference to section 800a and Gaussian image G1(t)Y. Gaussian 
image Gn+1(t)Y is then stored in memory 550a using frame store controller 605. Frame store controller 605 subsam- 
ples the Gaussian image Gn+1 (t)Y before storing the Gaussian image in memory 550a. 

[0062] Filter unit 615 also subtracts the Gaussian image Gn+1(t)Y from Gaussian image Gn(t)Y to produce Lapla- 
cian image Ln(t). Lapiacian image Ln(t) is then processed using LUT 645 which enhances the Lapiacian image Ln(t) to 
75 produce image Ln(t)\ LUT 645 is an eight bit LUT. The enhanced Lapiacian image Ln(t)' is then stored in memory 550b 
using frame store controller 610. 

[0063] The above steps of section 800b are repeated for n equals 2, 3, and 4 to generate Lapiacian images L1(t) 
through L4(t) and Gaussian images G2(t) through G4(t). As each successive image contains one fourth of the number 
of pixels as the previous image, the processing time to repeat the steps of section 800b is approximately the same as 
20 the processing time for generating Gaussian image G0(t) in section 800a. Lapiacian images L1(t) through L4(t) are 
each stored in memory 550b using frame store controller 61 0. The above steps for generating Gaussian image Gn+1 (t) 
and Lapiacian images L1 (t) through L4(t) are completed in about 3.2 msec. 

[0064] In parallel with the steps described above in section 800b, the chrominance component of the previous 
Gaussian image G0(t-1)C is retrieved from memory 550b and warped using frame store controller 610 and filter unit 

25 620. The filter unit 620 and frame store controllers 605 and 610 operate in the same manner to warp the chrominance 
component as described above to warp the luminance component. The warping of one field or half a frame of the pre- 
vious color image data is completed in about 3.7 msec. Once the color component of the previous Gaussian image 
G0(t-1)C is warped, it is stored in frame buffer 570. The warped color component is provided to frame buffer 570 via 
crosspoint switch 640, shown in Fig. 6, through output terminal OUT1 . 

30 [0065] As described above, the stabilization process first performs image correlations over a five-by-five region of 
Lapiacian image L4 to produce alignment parameters C4(i,j). The maximum alignment parameter C4(i,j)max is then 
selected from the alignment parameters C4(i,j). Finally, the peak alignment parameter, C4(i,j)peak, is produced. The 
peak alignment parameter is then used to warp Lapiacian image L3. In the same manner as for level four, the peak 
alignment parameter C3(i,j)peak is produced for level three. The peak alignment parameter C3(i,j)peak is used to warp 

35 the Lapiacian image L2. In the same manner as for level three, the peak alignment parameter C2(i,j)peak is produced 
for level two. Section 800c shown of Fig. 8 illustrates how these operations are implemented in the PPP 560. 
[0066] Lapiacian image L4(M) is retrieved from frame store controller 610 and provided to filter unit 620. Lapiacian 
image L4(t-1) is passed through filter unit 620 and provided to delays 625a and 625b. In other words, Lapiacian image 
L4(t-1) is not warped. The Lapiacian image L4(t) is retrieved from memory 550b using frame store controller 610 and 

40 provided to delays 625a and 625b. The retrieval of Lapiacian image L4(t) is delayed to compensate for the time delay 
incurred when Lapiacian image L4(t-1) is passed through filter unit 620. 

[0067] As described above, the first image to be warped in the Lapiacian pyramid is Lapiacian image L3(t-1) using 
alignment parameters C4(ij). If Lapiacian image L4(t-1) is provided to filter unit 620, filter unit 620 passes the Lapiacian 
image L4(M) to delays 625a and 625b without warping. 
45 [0068] Delays 625a and 625b, multipliers 630a and 630b, accumulators 635a and 635b are used to implement 
equation (3) above where n equals 4 (n = 4). Delays 625a and 625b are at least four pixel value delays long so that the 
appropriate pixels values in the Laplacians images L4(M) and L4(t) may be multiplied using multipliers 635a and 635b 
and subsequently accumulated using accumulators 635a and 635b to generate alignment parameters C4(i,j). The 
alignment parameters C4(i,j) are provided to statistical unit 647, shown in Fig. 6, which determines the maximum align- 
so ment parameter C4(i,j)max. Next, the alignment parameters C4(i,j) adjacent to the maximum alignment parameter 
C4(i,j)max are provided to filter unit 620. Filter unit 620 uses a two-by-two neighborhood of alignment parameters C4(i,j) 
including the maximum alignment parameter C4(i,j)max to generate the peak alignment parameter. Filter unit 620 may, 
for example, implement a bilinear interpolator to calculate the peak alignment parameter C4(ij) peak from the two-by- 
two pixel neighborhood. 

55 [0069] The peak alignment parameter C4(i,j) is provided to processor 540. Generation of the peak alignment 
parameter Cn(iJ) includes three iterations of the steps described above in section 800c. The time to complete these 
steps is about 6.6 msec. After each iteration, microprocessor 540 accumulates the peak alignment parameters 
C4(i,j)peak, C3,(i,j)peak, and C2(i,j) for each level of the Lapiacian pyramid from the most significant bit position to the 
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less significant bit position. Microprocessor 540 uses the accumulated alignment parameter to program filter unit 620 
and frame store controllers 605 and 610 to warp the images. 

[0070] The same steps described above are used to generate the alignment parameters Cn(ij) for level three (n=3) 
and two (n=2) except that the previous image is warped using the accumulated peak alignment parameter. For exam- 
5 pie, Lapiacian image L3(t-1 ) is warped by filter 620 in response to the accumulated peak alignment parameter which is 
peak alignment parameter C4(i,j) for level 4. Lapiacian image L2(t-1 ) is warped by filter 620 in response to the accumu- 
lated peak alignment parameter which is an accumulation of peak alignment parameter C4(i,j) for level 4 and peak 
alignment parameter C3(i j) for level 3. 

[0071] In parallel with the steps described above in section 800c, the current chrominance component of the cur- 
io rent image G0(t) is provide to frame store controller 605 and stored in memory 550a via crosspoint switch 620, shown 
in Fig. 6. Transfer and storage of the chrominance component is about 3.7 msec. Pixel values are stored in frame buff- 
ers 570 and 580 at a clock rate of 50 MHz, while data is retrieved from the frame buffers 570 and 580 by the video output 
encoder 590 at substantially the same time at a clock rate of 12.5 to 13.5 MHz, which requires about 16 msec to read 
the data stored in the buffers. The frame buffers 570 and 580 are double buffered. 
is [0072] Returning to Fig. 5, the stabilized luminance component signal Y and the stabilized chrominance component 
signals Cv and Cu are respectively provided from frame buffers 580 and 570 to encoder 590. Encoder 590 combines 
the components to produce a color video signal at output terminal OUT The encoder 590 is, for example, a Philips part 
number SAA71 87 device. 

[0073] In Fig. 5, microprocessor 540 is used to program the encoder 590 and decoder 51 0, and to control the image 
20 processing functions of the PPP 560, via processor signal PS. Microprocessor 540 is also used to program registers in 
the frame store controllers 605 and 610 for each of the steps of the stabilization process described above. Microproc- 
essor 540 may operate at a rate of, for example, 20 MHz, so that reprogramming time between each processing step 
of the stabilization process is minimized. Microprocessor 540 has a program memory for storing code to operate the 
stabilization system 500. Microprocessor 540 may also include connections to an external EPROM, e.g. a FLASH 
25 EPROM (not shown), for boot-up procedures for the stabilization system. In an alternative embodiment, microprocessor 
540 may be incorporated into PPP 560. This may reduce the manufacturing cost of such a system. 
[0074] Control circuitry 595 controls the interface between input and output buses of decoder 51 0 and encoder 590, 
and the interface and counters (not shown) to control frame buffers 520, 530, 570, and 580. Control circuitry 595 also 
extracts the timing signal HA and signal VA which are used to control the operation between buffers 520 and 530 and 
30 PPP 560, and between the buffers 570 and 580 and encoder 590. Control circuitry 595 also controls the interface 
between microprocessor 540 and PPP 560. Control circuitry 595 controls these operations via control signals CS. 

Description of Frame Store Controller 

35 [0075] The frame store controllers 605 and 610 and memories 550a and 550b, shown in Fig. 6, are described 
below. An alternate frame store controller is described below with reference to Figs. 25-28. The frame store controllers 
605 and 610 provide triple port access to memories 550a and 550b for video processing. The triple port access 
includes the substantially simultaneous writing of a continuous video data stream into memories 550a and 550b, shown 
in Fig. 6, reading a continuous video data stream from memories 550a and 550b, and providing random access to data 

40 in memories 550a and 550b. 

[0076] The Gaussian image signal Gn(t) and Lapiacian image signal Ln(t) are retrieved and stored in memories 
550a and 550b during the stabilization process. The retrieval and storage of image data in memories 550a and 550b 
are described below. The key signals provided to and from frame store controller 605 are shown in Fig. 9. 
[0077] The frame store controllers 605 and 610, shown in Fig. 6, are responsible for implementing pipelined read 

45 and store operations within memories 550a arid 550b in a dual-ported fashion. The frame store controller 605 and 61 0, 
execute rectangular region-of-interest (ROI) operations for data retrieval or storage operations, because the images are 
stored and retrieved in a two-dimensional manner from memories 550a and 550b. The frame store controllers 605 and 
610 may also resample the images on read and store operations, with nearest-neighbor pixel interpolation on read 
expansion operations and decimation on store operations. 

so [0078] The frame store controllers 605 and 610 also perform single pixel value retrieval and store operations in a 
random access manner from the global bus 670, shown in Fig. 6. This allows processor 540 and control circuitry 595 
to access the memories 550a and 550b independently of the pipelined retrieval and store operations that may be exe- 
cuted within the frame store controllers 605 and 610. Access port 2340 for the global bus is shown in Fig. 23. In the 
exemplary embodiment of the invention, access port 2340 of the frame store controller is enhanced using caching to 

55 take advantage of the faster accessing modes of the frame store memories 550a and 550b, shown in Fig. 6, when pixel 
accesses occur over the global bus. 

[0079] Fig. 23 is a block diagram of the interface circuitry of the frame store controllers 605 and 610 which imple- 
ment the triple port access. The interface circuitry includes FIFOs, data caches, and control logic for controller retrieval 
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and storage operations. As is shown in Fig. 24, access to memories 550a and 550b, shown in Fig. 6, are scheduled so 
that optimum use can be made of the data burst capability of the memories 550a and 550b. 

[0080] Image data is retrieved by the frame store controller 605 in a continuos raster scan from a region of interest 
RD_ROI, shown in Figs. 10-12, synchronous with read clock timing signal RD_CLK, shown in Figs. 13-15. When data 

5 is retrieved from memory 550a, shown in Fig. 6, two timing signals RD_HA and RD_VA, are generated which indicate 
when the retrieved data signal RD_OUT, shown in Figs. 13-19, is valid. Figs. 13-19 and 22 are timing diagrams which 
illustrate the. operation of the frame store controller shown in Fig. 9. Timing signal RD_HA is in a logic high state when 
the data in a line is valid (i.e. during the active picture interval) and in a logic low state otherwise (i.e. during the hori- 
zontal blanking interval). Timing signal RD_VA is in a logic high state when the data in a line is valid (i.e. during the 

to active picture interval) and in a logic low state otherwise (i.e. during the vertical blanking interval). Alternatively, a field 
signal, described below, may be used to read data using interlace read operations. 

[0081] In addition, each horizontal active period can be delayed in response to a signal RD_HOLD, shown in Fig. 
9, provided to frame store controllers 605 and 610, shown in 6. Fig. 16 illustrates the operation of the frame store con- 
troller 605 when the signal RD_HOLD is used. When the signal RD_HOLD is low and the register RD_HOLD_ENA is 

75 set, the read operation is held during the time specified by data value RD_HBLANK, shown in Figs. 14 and 16, until sig- 
nal RD_HOLD is high. Alternatively, the read operation timing can be slaved to the horizontal drive signal EXTJHDn and 
vertical drive signal EXT_VDn, each shown in Figs. 17-19, when the bit value RD_EXT_ENA is set. Figs. 17-19 illus- 
trate the operations of the frame store controller 605 when read operation timing is slaved to external drive signals. 
Retrieval of data is initiated in response to RD_SYNCn signal, shown in Figs. 14 and 16, afterthe register RD_ENA has 

20 been set. The signal RDJHOLD allows the horizontal and vertical blanking intervals to be inserted into the video signal 
as it is displayed. The read operation is held because only active video information is stored into the memories 550a 
and 550b. 

[0082] As is shown in Figs. 13 and 14, when the read ready signal RD_RDY is logic-one, it indicates the data 
retrieval operation is idle. The read ready signal RD_RDY is changed to a logic-zero after the register RD_ENA is 
25 accessed. The read ready signal RD_RDY changes to a logic-one when the data retrieval operation is activated and 
changes to a logic-zero afterthe signal RD_SYNCn pulse and until the data retrieval operation is complete. In continu- 
ous mode operation, the read ready signal RD_RDY remains a logic-zero. 

[0083] The image data to be retrieved is held in memory 550a, shown in Fig. 6. Memory 550a is organized as a 
two-dimensional array. The logical data bit size, i.e. depth, of the memory 550a is 8 bits or one byte. Data is physically 
30 stored, however, as 16 bit data, with even pixels values XO, X2, X4, X6,... stored in the" upper byte, and the odd pixel 
values X1, X3, X5, X7,... stored in the lower byte. 

[0084] The width (i.e. - horizontal size) and height (i.e. - vertical size) of memory 550a is one Mbyte and is arranged 
as a one Kbyte by one Kbyte array. Memory 550a is not limited to one Mbyte and may vary depending on the particular 
application and memory used. Memory 550a may be, for example, two 256k by 1 6 bits video RAM (VRAM) chips where 
35 the combined VRAM chips hold 1024K 8-bit pixel values. Therefore, memory 550a has a logical address structure 
which stores 1 024 pixel values by 1024 pixel values. 

[0085] Data stored in memory 550a is retrieved using a continuos raster scan type. read from a region of interest 
RD_ROI, shown in Figs. 10-12, synchronous with signal RD_CLK, shown in Fig. 15. After all read parameters are pro- 
grammed, described below, retrieval of data from the memory 550a is initiated in response to signal RD_ENA, shown 

40 in Figs. 13-14. As is shown in Figs. 13-15, data signal RD_OUT and timing signals RD_VA and RD_HA are generated 
in response to the signal RD_SYNCn pulse. The data retrieval operations of frame store controllers 605 and 610 are 
synchronized by programming the registers (described below) of each frame store controller 605 and 610, setting reg- 
ister RD_ENA, and then providing signal RD_SYNCn pulse to both frame store controllers 605 and 610. For example, 
as shown in Fig. 8, Gaussian image signals GO(M) and Gn(t) respectively are retrieved from frame store controllers 

45 610 and 605 at the same time. 

[0086] The first pixel, i.e. -first valid data, of the data retrieval operation is defined by the data values stored in reg- 
isters RD_HZERO and RD_VZERO. The first pixel appears a predetermined number of clock cycles of the read timing 
signal RD_CLK (shown in Fig. 15) after a pulse in signal RD_SYNCn. The appearance of the first pixel value in clock 
cycles is defined by the combination of data values stored in registers RD_HSTART and RD_VSTART and an inherent 

so delay time delay RD_HDELAY. RD_H DELAY is the inherent fixed horizontal time delay of the read operation relative to 
the signal RD_SYNCn pulse. Data values in registers RD_H START and RDJVSTART are programmable so that read 
operations from two sources can be aligned in time. These registers are programmed using microprocessor 540 via sig- 
nal PS. 

[0087] The region of interest RD_ROI is shown in Figs. 10-12. The region of interest is defined by data values 
55 RD_HZERO, RD_VZERO, RD_HSIZE, and RD_VSIZE. Data value RDJHZERO is the horizontal, i.e. - column, coordi- 
nate value of the region of interest RD_ROI. Data value RD_VZERO is the vertical, i.e. - row, coordinate value of the 
upper left most pixel value of the region of interest RD_ROI. Data value RD_HSIZE is the horizontal size, i.e. - width, of 
the region of interest RD_ROI to be retrieved and specifies the number of clock cycles (pixel values) that a horizontal 
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line bf the region of interest is to be read. Data value RD_VSIZE is the vertical size, i.e. - height, of the region of interest 
RD_ROi to be retrieved. As shown in Fig. 1 2, if the region of interest crosses the memory boundary, the retrieval of data 
wraps around the boundaries of the image data stored in memory 550a. 

[0088] ; As is described above, during the stabilization process shown in Fig. 8, frame store controllers 605 and 610 
5 produce an intermediate warped image which is warped by the integer portion of the alignment parameters Cn(i j). The 
intermediate warped image is generated by adjusting the data values in registers RDJHZERO and RD_V2ERO. The 
horizontal component x is added to the data value RD_HZERO and the vertical component y is added to the data value 
in register RD__VZERO to adjust the starting position of the region of interest RD_ROI, shown in Figs. 1 0-1 2. By adjust- 
ing the starting position of the region of interest RD_ROI, the image data is effectively warped (i.e. translated horizon- 
io tally and vertically) to produce the intermediate warped image when the image data is retrieved from memories 550a 
and 550b. 

[0089] After data retrieval is initiated, each pixel value in the horizontal line with size RD_HSIZE is retrieved in con- 
secutive order at a rate determined in response to read timing signal RD_CLK, shown in Fig. 15. There is a minimum 
programmable time between each horizontal line scan retrieval which is defined by the data value stored in register 
75 RD_HBLANK. This value specifies a number of read cycles corresponding to a horizontal blanking interval. The data 
value stored in register RDJVSIZE specifies the number of lines to be read. After the number of lines defined by register 
RD_VSIZE are retrieved, the data retrieval operation is completed. 

[0090] Alternatively, the data retrieval operation df the frame store controller 605 may be set to continuous opera- 
tion where the read operation will repeat after a programmable number of lines defined by data value RD_VBLANK. The 

20 continuous operation can be terminated by resetting the control bit that enables the continuous operation. 

[0091] All read operations can be terminated and placed in an idle state by setting the data value in register 
RD_RST. The signal RD_HA, shown in Figs. 13-19 is changed to a logic-zero when data retrieval is terminated. Other- 
wise, the signal RD_HA is continuously generated until register RD_ENA is set. Retrieval of image data is terminated 
by setting register RD_ENA. In this case, the next image data retrieval operation is initiated by the frame store controller 

25 605. 

[0092] A zoom factor may also be applied to the image data during retrieval operation. The region of interest 
RD_ROI may be adjusted independently in the horizontal and the vertical direction of the image. The number of pixels 
values in a line and the number of lines is increased by repeating the pixel values/lines retrieved from the region of inter- 
est by the respective data values in registers RD_HZOOM and RD_VZOOM (not shown). Registers RD_HZOOM and 
30 RD_VZOOM are in frame store controllers 605 and 610. The image may be adjusted by a factor of, for example, one, 
two, to sixteen. The data values in registers RD_HSIZE and RD_VSIZE represent the size of the image after the zoom 
factors have been applied. 

[0093] An example of a data retrieval which performs a zooming operation is described below. The read region of 
interest RD_ROI and the zoom data values are: RD_HZERO = 1 0; RD_VZERO = 8 ; RD_HSIZE = 64; RD_VSIZE = 60; 
35 RD_HZOOM =2; and RD_VZOOM = 2; RD_HSTART = 20; RD_VSTART =2; RD_HBLANK = 32; RD_VBLANK =5; and 
RD_HDELAY =6. 

[0094] In response to these data values, an image segment having 32 pixel values by 30 pixel values is retrieved 
starting at logical coordinates values [10, 8] in memory 550a. The frame store controller 605 increases the number of 
pixel values by a factor two by repeating each pixel value in the horizontal and vertical directions. For example, if pixel 
40 values X1 1, XI 2, X13,... are retrieved from line one and pixel values X21, X22, X23,... are retrieved from line two, in the 
region of interest RD_ROI stored in memory 550a, shown in Fig. 6, frame store controller 605, shown in Fig. 6, pro- 
duces four corresponding lines in signal RD_OUT as shown in Table 2 below. 



TABLE 2 


Line Number 


Pixel Values 


1 


X11, X11, X12, X12, X13, X13... 


2 


X1 1, X11, X12, X12, X13, X13... 


3 


X21, X21, X22, X22, X23, X23... 


4 


X21, X21, X22, X22, X23, X23... 



55 [0095] The first pixel valve is provided from frame store controller 605 two hundred eighteen read clock cycles of 
read clock signal RD_CLK., shown in Fig. 15, after the negative edge of signal RD_SYNCn pulse, shown in Figs. 13 
and 14, as determined in accordance with equations (2) and (3) below. 
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RD_H DELAY + RD_H START + VSTART * line_time =6 + 20 + (2 * 96) = 21 8 (2) 
line„time = RD_HSlZE + RD_HBLANK = 96 (3) 

5 [0096] The total time to provide ail of the pixel values from the frame store controller is 5728 clock cycles of read 
clock signal RD_CLK and is determined in accordance with equation (4) below. 

RD_VSIZE * line_time - RDJHBLANK = 60 * 96 - 32 = 5728 (4) 

w [0097] If the frame store controller 605 is operating in a continuous data retrieval mode, the time between the last 
pixel of the previous image and the first pixel of the next image is 512 clock cycles of read clock signal RD_CLK and is 
determined in accordance with equation (5) below. 

RD_VBLANK * line_time + RDJHBLANK = 5 * 96 + 32 = 512 (5) 

75 

[0098] The total time to retrieve an entire image frame stored in memory 550a in continuous operation is 6240 clock 
cycles of read clock signal RD_CLK and is determined in accordance with equation (6) below. 

(RD_VSIZE + RD_VBLANK) * line_time = 6240 (6) 

20 

[0099] The frame store controller 605 performs data retrieval in progressive (non-interlace) mode. When the bit 
value RD_ INTER LACE of the register RD_CTRL is changed to a logic 1, data retrieval is performed in interlace mode. 
The read region of interest RD_ROI is a frame including two fields. One field is half a frame consisting of alternate lines 
of the frame. Field one is the field that starts with the first line of the frame and field two is the field that starts with the 

25 second line of the frame. 

[0100] The bit values RD_FIELD1_ENA and RD_FIELD2_ENA of register RD_CTRL identify which fields are 
retrieved. For example, if both bit values RD_FIELD1_ENA and RD_FIELD2_ENA are a logic-one, then a single data 
retrieval operation includes the retrieval of two consecutive field images, field one followed by field two. The time dura- 
tion between the retrieval of the two fields is defined by data value RD_VBLANK. If only one of the bit values 

30 RD_FIELD1_ENA and RD_FIELD2_ENA is a logic-one, then the single data retrieval operation includes the retrieval of 
only one field. For example, if bit value RD_FIELD1_ENA is a logic-one and bit value RD_FIELD2_ENA is a logic-zero, 
field one is retrieved. If bit values RD_FIELD1_ENA and RD_FIELD2_ENA are both logic-zero, then no read operation 
occurs. Signal RD_FLD1 indicates which field is being read out. When signal RD_FLD1 is a logic-one, field one is being 
retrieved and when signal RD_FLD1 is a logic-zero, field two is being retrieved. 

35 [0101] All data retrieval may be continuous or non-continuous. When data retrieval is continuous, it is initiated in the 
same manner as the non-continuous data retrieval. Data values stored in registers RD_H START and RD_ VSTART are 
used to retrieve the first frame and are not subsequently used to determine the start of the retrieval of subsequent 
frames or fields. After the first frame or field is retrieved, the data retrieval operation is repeated. There is a blanking 
interval between the repeated data retrieval operations set by data value RD_VBLANK. 

40 [0102] When data is retrieved continuously, signal RD_RDY is a logic low. The continuous mode may be stopped 
by changing bit value RD_CONTINUOUS of register RD_CTRL to logic-zero. The current data retrieval operation is 
then completed and the read ready signal RD_RDY is changed to a logic-one. An interrupt request output may be gen- 
erated after the completion of a data retrieval operation using register RDJRQ_ENA. This data bit value is also written 
into the FS_STATUS register. The default setting is zero which indicates disabled. 

45 [0103] The read hold operation is described below with reference to Fig. 16. During the read hold operations, the 
active line time of signal RD_HA is delayed in response to signal RD_HOLDn, shown in Fig. 16. When signal 
RD_HOLDn is a logic-zero the active line time is delayed. When signal RDJHOLDn is changed to a logic-one, the line 
data becomes active after a number of read cycles defined by data value RD_HOLD_DELAY which specifies the delay 
as a number of clock cycles. Signal RD_HOLDn is changed to a logic low when bit value RDJHOLD_ENA of register 

so RD_CTRL is set to a logic-one. 

[0104] The active line time is delayed when the output data signal RD_OUT is provided to a limited or asynchro- 
nous data throughput, such as a FIFO. It is desirable that the component circuitry to which data signal RD_OUT is sup- 
plied to hold an entire line of the frame. Signal RD_HOLD is an asynchronous signal and should be internally 
synchronized with signal RD_CLK. 

55 [0105] Figs. 17-19 are timing diagrams illustrating the operation of the frame store controller 605 when the data 
retrieval operation is slaved to signals EXT_HDn and EXT_VDn. Fig. 17 illustrates the operation of the frame store con- 
troller 605 when (1) the first field is retrieved in interlace mode or (2) when a frame is retrieved in non-interlace mode. 
Fig. 1 8 illustrates the operation of the frame store controller 605 when the second field is retrieved in interlace mode. 
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Fig. 19 illustrates the operation of the frame store controller 605 when the data retrieval operation is saved to signals 
EXT_HDn and EXT_VDn and for different horizontal sizes of the region of interest, shown in Figs. 10-12, specified by 
the data value in register RDJHSIZE. 

[0106] ; When the bit value RD_EXT_ENA of register RD_CTRL is one, the data retrieval operation is slaved to sig- 
5 nals, EXT_HDn and EXT_VDn and frame store controller 605 ignores signal RD_SYNCn pulses. After register 
RD_EXT_ENA is set, the data retrieval operation is initiated when signals EXT_HDn and EXT_VDn are a logic-zero at 
the same time. In non-interlace mode, data retrieval of a frame is initiated and in interlace mode, the data retrieval of 
the field is initiated. 

[0107] The first active line may be delayed by a number lines determined by data value RD_VSTART in response 
io to the number of falling edges of signal EXTJ/Dn. The number of lines is generated in response to the falling edge of 
signal EXT_VDn where each falling edge is counted as one line. The active pixel values in a line may be delayed by a 
number pixel values determined by data value RD_H START in response to the number of falling edges of signal 
EXTJHDn. The number of pixel values is generated in response to the falling edge of signal EXTJHDn where each fall- 
ing edge is counted as one pixel value. 
75 [0108] As shown in Fig. 18, in interlace mode, the second field is defined by the falling edge of signal EXT_VDn that 
is not necessarily coincident with the falling edge of signal EXT_HDn. The next falling edge of signal EXT_HDn is then 
the start of the data retrieval operation of the second field. If one of the bit values RD_HELD1_ENA and 
RD_HELD2_ENA of register RD_CTRL is not set, then the data retrieval operation does not retrieve data during that 
field. Signal RD_FLD1 indicates the switch time between field one and field two. 
20 [0109] The sum of data value RD_H START and data value RD_HSI2E is desirably less than the total line time of 
signal EXTJHDn. The sum of data value RD_VSTART and data value RD_VSIZE is desirably less than the frame or 
field time of signal EXT_VDn. 

[0110] In continuous mode and external timing mode, signal RD_RDY is a logic-zero. In non-continuous mode, sig- 
nal RD_RDY becomes a logic-one at the end of data retrieval of the frame in non-interlace mode or the second field in 
25 interlace mode. In the external timing mode, signals RD_HA, RD_VA, and RD_FLD1 are the same as when internal tim- 
ing is used. 

[0111] The write operation of frame store controller 605 is examined below with reference to Fig. 22. Image data is 
stored in memory 550a using frame store controller 605 during the write operation. The write operation is defined as a 
continuous raster scan write to a write region of interest WR_ROI, shown in Figs. 20 and 21 , of memory 550a synchro- 

30 nous with write clock signal WR_CLK. Register WR_ENA is set after all other write. parameters are programmed. Set- 
ting register WR_ENA activates the write operation, which then waits for the beginning of the input image data signal 
WRJN as defined by signals WR_HA and WR_VA. Signal WR_HA is in a logic high state when the data in a line is valid 
(i.e. during the active picture interval) and in a logic low state otherwise (i.e. during the horizontal blanking interval). Tim- 
ing signal WR_VA is in a logic high state when the data in a line is valid (i.e. during the active picture interval) and in a 

35 logic low state otherwise (i.e. during the vertical blanking interval). The rising edge of both signals WR_HA and WR_VA 
indicates the start of valid data. 

[0112] The image data signal WRJN is stored starting at a memory location defined by data values stored in reg- 
isters WRjHZERO and WR_VZERO, shown in Figs. 20 and 21. The data values in registers WR_HZERO and 
WR_VZERO are coordinates at which the image data is to be stored in memory 550a. All valid pixel data identified by 
40 signals WR_H A and WR_VA, shown in Fig. 22, are stored in memory 550a, shown in Fig. 6, or subsampled in response 
to data values stored In registers WRJHSAMPLE and WR_VSAMPLE (as shown in Fig. 21) and then stored in memory 
550a, shown in Fig. 6. 

[0113] Alternatively, the first pixel value stored may be defined by the data values stored in registers WR_H START 
and WR_VSTART relative to the first valid data identified by signals WR_HA and WR_VA. In this case, each horizontal 

45 line has a number of pixel values specified by the data value in register WR_HSIZE or the remaining valid data in each 
horizontal line as defined by signal WR_HA, whichever is smaller, is stored in memory 550a consecutively at the rate 
determined by the cycles of write clock signal WR_CLK, shown in Fig. 9. This process is repeated when valid data is 
received from the next line. A total number of lines determined by the data value in register WR_VSIZE or as many 
active lines that are remaining as valid data, whichever is smaller, are stored memory 550a during this operation. 

50 [0114] Pixel values may be stored in memory 550a in an interlace mode, storing one or both fields of a frame. In 
interlace mode, pixel values are stored in memory in response to signal WR_FLD1, shown in Fig. 9. Signal WR_FLD1 
determines were a field in the input image signal WRJN, shown in Fig. 9, is stored in memory 550a. When signal 
WR_FLD1 is a logic-one, the input field in input signal WRJN is stored starting at coordinates in memory 550a defined 
by the data values in registers WR_HZERO and WR_VZERO. A line counter (not shown) is incremented by two for each 

55 subsequent line and each subsequent line of the field is stored in memory 550a at a line position determined by the line 
counter. When signal WR_FLD1 is a logic-zero, the input field in input signal WRJN is stored starting at coordinates in 
memory 550a defined by the data values in registers WRJHZERO and WR_VZERO+1 . A line counter (not shown in the 
drawings) is incremented by two for each subsequent line and each subsequent line of the field is stored in memory 
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550a at a line position determined by the line counter. The storing of data may also be set to a continuous operation 
mode, in which data is stored in memory 550a. The write region of interest WR_ROI, shown in Figs. 20 and 21, is a 
rectangular region of interest in memory 550a for the write operation. 

[0115] Prior to the start of data storage in memory 550a, the write registers, described below, are programmed 
5 using microprocessor 540, shown in Fig. 5. Then, register WR_ENA, shown in Fig. 22, is set to activate the data storage 
operation. In response, signal WR_RDY is changed to a logic-zero until the storage operation is complete and the signal 
WR_RDY changes to a logic-one. The value of signal WR_RDY can be retrieved from register FS_STATUS. In contin- 
uous mode, signal WR_RDY remains a logic-zero. 

[01 1 6] Active pixel values identified by signals WR_HA and WR_VA, shown in Fig. 22, are stored in the write region 
10 of interest, shown in Figs. 20 and 21, in memory 550a, shown in Fig. 6, synchronous with write clock WR_CLK. The 
active pixel values may be subsampled prior to being stored in memory 550a in response to the data values in registers 
WRJHSAMPLE and WR_VS AMPLE synchronous with write clock WR_CLK. 

[0117] The frame store controller 605 includes write registers (not shown in the drawings). Each of the write regis- 
ters in the frame store controller 605 is programmable using microprocessor 540 using programming signal PS. These 

75 registers include WR_CTRL which specifies the WRITE control parameters described below. These parameters 
include the WRJNTERLACE parameter indicates when set to logic -one that the write operation is interlace mode stor- 
ing one or two fields. The default setting is zero which indicates non-interlace mode. The WR_FIELD1_ENA parameter 
enables the storage of the first field in interlace mode. The default setting is one which indicates storage of the field is 
enabled. The WR_FIELD2_ENA parameter enables the writing of the second field in interlace mode. The default setting 

20 is one which indicates storage of the field is enabled. The WR_CONTINUOUS parameter enables image data to be 
stored to memory 550a continuously. The default setting is zero which indicates storage of image data is not continu- 
ous. When set, the WR_IRG_ENA parameter sets the signal FSJRQ and a bit in register FS_STATUS at the comple- 
tion of the storage operation. The default setting is zero which indicates that signal FSJRQ signal and the bit value of 
register FS_STATUS are not set. The WR_DATA_ZERO parameter, when one, the pixel values in the write region of 

25 interest WR_ROI is set to zero, independent of the input image data signal WRJN. The default setting is zero. 

[01 1 8] The input image can be sub sampled when data is written to memory 550a. Sub sampling may be performed 
independent in the horizontal and the vertical direction of the write region of interest WR_ROI. The sub sampling period 
can be a factor of, for example, one, two, or sixteen. The phase of the subsample operation is determined by the data 
values in registers WR_H START and WR_VSTART. For example, if the sample period is a factor of two, and the data 

30 value WR_HSTART is zero, then pixel values X0, X2, X4,... are stored. If the data value irrregister WR_H START is one, 
then pixel values X1, X3, X5,... are stored. If the sample period is a factor of six and the data value in register 
WR_H START is four, then pixel values X4, XI 0, X16,... are stored. 

[0119] The frame store controller 605 also contains a write mask register WR_MASK (not shown in the drawings). 
Register WR_MASK is a 1 6 bit register. The upper 8 bits define the mask for the even pixels in the frame store controller 

35 605, and the lower 8 bits define the mask for the odd pixels in the frame store controller 605. When activated, the pixel 
values in the input image data signal WRJN to be stored are masked. A bit plane is masked during the video write oper- 
ation when the associated bit is set low in the register WR_MASK. In default mode register WR_MASK is set to all ones. 
[0120] The register WR_MASK is set before the first video write operation after power-up. Subsequently, register 
WR_MASK may be changed when the mask needs to be changed. Register WR_MASK is not double buffered and, 

40 therefore, should be changed when the video write operation is not active, and prior to the setting of register WR_ENA. 
[0121] The exemplary addressing scheme described above requires a relatively large address bus. In this case it 
is 1 9 bits. As an option a page register could be defined to reduce the address bus size. An image frame store operation 
is implemented by performing read and write operations with two bytes at the same time. 

[0122] The registers of the frame store controller 605 are double buffered so that these registers can be pro- 
45 grammed while the frame store controller is in operation. When the RD_ENA register is accessed the read registers that 
have been changed will be transferred to the active registers. When WR_ENA register is accessed, the write registers 
that have changed will be transferred to the active registers. Registers RD_RST, RD_ENA, RD_CTRL, W&RST, 
WR_ENA WR_CTRL, and WRJvlASK should not be double buffered. 

[0123] To implement a triple port Frame Store (FS) with such high bandwidth with low-cost memory devices, FIFOs 
so and/or data caches are implemented internally to the FS controller as is shown in Fig. 23 which illustrates the interface 
circuitry of the frame store controllers 605 and 61 0, shown in Fig. 6, with memories 550a and 550b. Access to the mem- 
ories 550a and 550b are scheduled so that they are consistent with the data burst capability of the memory devices. 
[0124] As shown in Fig. 23, there is one data path 2310 to the external memory device. Data path 2310 is M bits, 
where M is typically 8, 16, 32, or 64 bits. The video access terminals are V bits where V is typically 8 bits. The CPU 
55 access terminal 2340 is D bits where D is an integer between eight and sixty four. Image data is retrieved from and 
stored^n memory 550a as data bursts of B memory words having a width of M bits. Each access terminal 2310, 2320, 
and 2340 may be active at substantially the same time. As described above, the image data RD_OUT, shown in Fig. 9, 
provided from frame store controller 605 and image data WRJN, shown in Fig. 9, provided to frame store controller 605 
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are regular signals having fixed blanking times as determined by signals RD_HA, RD_VA, WR_HA, and WR_VA shown 
in Figs 15 and 22. 

[0125] The data provided to FIFOs 2345 and 2350 from memory 550a are provided in bursts of data. In other 
words, there is a block transfer of data from memory 550a to FIFO 2350 and FIFO 2345 to memory 550a. The data 

5 exchange between FIFOs 2345 and 2350 and memory 550a are provided in bursts of data. The data exchange 
between FIFOs 2345, 2350, and that respective access terminals 2320 and 2330 are provided as serial 8-bit words. 
[0126] Data provided to and from CPU access terminal 2340 is a lower priority and tasks that need to be performed 
to control the memory 550a, such as memory refresh, are shared with use of the CPU access terminal 2340. 
[0127] As is shown in Fig. 24, the frame store controller performs several operations during a frame store cycle. In 

10 the frame store access cycle there are three memory access cycles: video write cycle (VWC) of B words, video read 
cycle (VRC) of B words, and a CPU access cycle (CAC). The bandwidth of the image input and image output terminals 
is determined in accordance with equation (7) below. The data value VB is the bandwidth of the terminals 2320 and 
2330. 

15 VB = - • where FSC = VWC + VWC + VRC + CAC (7) 



[0128] Write FIFO 2345 and read FIFO 2350 accommodate the data to be provided to and from memories 550a 
20 and 550b. Accordingly, the terminals 2320 and 2330 are regular and are not interrupted. The retrieval and storage of 
image data is controlled using video read control 2380 and video write control 2375, respectively. Video read control 
2380 and video write control 2375 control the image data retrieval and storage operations as described above with ref- 
erence to Figs. 1 0 through 22. 

[0129] An access to CPU access terminal occurs if there is a request from processor 540, shown in Fig. 5. The 
25 access to CPU access terminal is not initiated if the CPU access cycle CAC is being used by another function, such as 
a DRAM refresh cycle. If the width D of the CPU data bus 2355 is equal to or less than the width M of the memory data 
bus, then the CPU access latency will be more than CAC, less than (2 * FSC + CAC) when a refresh cycle needs to be 
provided, and is typically slightly longer than FSC/2 + CAC provided that refresh cycles are infrequent. If width D is 
larger than width M, the CPU access latency is increased by a value of IL determined in accordance with equation (9) 
30 below. 

IL= FSC •(§-!) (9) 

35 

[0130] Since the typical CPU access latency may be a long period of time, the CPU cache memory including write 
cache memory 2360 and read cache memory 2365, is included within the circuitry of the frame store controller 605. 
Write cache memory 2360 and read cache memory 2365 may be combined as a single memory. 
[0131] The write cache memory 2360 stores image data from processor 540, shown in Fig. 5. During the CPU 
40 access cycle CAC, frame store controller 605 stores image data in the write cache memory 2360. The entire contents 
of the write cache memory 2360 may not be stored during the CPU access cycle CAC when the write cache memory 
2360 includes more than M bits. During the CPU access cycle CAC, a burst of image data may be retrieved from or 
stored in memory 550a. 

[0132] When processor 540, shown in Fig. 5, initiates a image data retrieval operation, the image data to be 
45 retrieved is stored in read cache memory 2365, the image data stored in read cache memory 2365 is provided directly 
to processor 540 via CPU access terminal 2340. Otherwise, the appropriate image data from memory 550a is retrieved 
during the next CPU access cycle. Then, the retrieved data is provided from read cache 2365 to processor 540. Image 
data can be retrieved from the memory 550a and stored in read cache 2365 in anticipation of image data requests from 
processor 540. For example, if the processor has retrieved image data from memory 550a during a previous CPU 
so access cycle CAC, the next sequential image data in memory 550a may be retrieved and stored in read cache 2365 
during the next or subsequent CPU access cycles CAC. In this way, if processor 540 is retrieving image data sequen- 
tially, the next sequential image data requested by processor 540 will already be stored in read cache 2365. Thus, CPU 
access latency is minimized. 

[0133] If image data in the write cache memory 2360 includes image data which updates image data to be retrieved 
55 by or which is already stored in read cache memory 2365, cache control 2370 controls the update of image data stored 
in the read cache memory 2365. For example, if the read cache memory 2365 contains image data that is being 
updated during a write cycle from processor 540 or by image data provided in input image signal WR_IN to terminal 
2320, then the image data in read cache memory 2365 could be marked as "old" by the cache control 2370. Cache con- 
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trol 2370 requests updated image data be retrieved from memory 550a and stored in read cache 2365 during the next 
CPU access cycle CAC. 

[0134] Memories 550a and 550b may be, for example, synchronous DRAM (SDRAM). In a SDRAM memory, the 
memory is divided into two memory banks. An access to one of the banks is typically in the form of a burst of data of, 
for example', up to eight data items. While one bank of data is being accessed, the access to the other banks may be 
initiated' Thus, a I'ong burst access to the memory is provided. The data stored in the SDRAM is stored in eight data 
blocks in an interleaved fashion between the two memory banks. The data accesses are synchronous clocked and pro- 
vide a high data burst access of 100 MHz (10 ns access cycle). 

[0135] ' Each burst of image data requires a setup time. The overhead for a burst read cycle is about eight clock 
cycles of 10 ns. The write overhead is about five clock cycles of 10 ns. By increasing the burst size (BS), the overhead 
becomes relatively insignificant. In this case the size of the write FIFO 2345 and the read FIFO 2330 is increased to 
accommodate the increased amount of image data provided from memory 550a during a burst. 

[0136] Also, if the burst length is large, a small size image will have significant overhead. For example, if the burst 
length is 128 pixel values, yet the image data size that is being retrieved or stored is 1 6 pixel values, the effective access 
rate is approximately 1 6/128 that of a standard video rate. Thus, the burst length is chosen depending upon the partic- 
ular application. 

[0137] If the width of memory 550a is M bits and M is a multiple of the width V of access terminals 2320 and 2330, 
the effective video I/O rate of the frame store controller 605 may be increased. When the width of memory 550a is M 
and is a multiple of the width V of access terminals 2320 and 2330, write FIFO 2345 and read FIFO are as wide as the 
memory bus 2310, and bus width converters need to be added at the input of the write FIFO and at the output of the 
read FIFO. 

[0138] Tables 3-5 provide the image retrieval and storage rates as a function of the width M of memory 550a and 
the width V of the access terminals 2320 and 2330. Table 3 illustrates the total number of 10 ns clock cycles required 
for each video write cycle VWC of B words, video read cycle VRC of B words, and a CPU access cycle CAC. 



TABLE 3 

Burst size: 8 16 32 64 128 256 

Video Read cycles 16 24 40 72 136 264 

Video write cycles 13 21 37 69 133 261 

CPU access cycles (16 24 40 72 136 264 



[0139] Table 4 provides the estimated video I/O rate as a function of burst size and width of the memory. Table 4 
assumes that each of the accesses: the video write cycle VWC, video read cycle VRC, and CPU access cycle CAC, are 
equal size data bursts. The number of cycles in Table 4 are shown as a function of the width M of the data bus 2310, 
shown in Fig. 23, between memory 550a and the interface circuitry. 



TABLE 4 

Burst size: 8 16 32 64 128 256 

Total cycles: 45 69 117 213 405 789 

M=8 18 23 27 30 32 32" 

M=16 36 46 55 60 63 65 

M=32| 71 93 109 120 126 130 



[0140] Table 5 is the estimated video I/O rate as a function of burst size and memory data width. In Table 5 is 
assumed that video write cycle VWC and the video read cycle VRC use equal size data bursts, but the CPU access 
cycle CAC uses eight pixel value bursts. The number of cycles in Table 5 are shown as a function of the width M of the 
data bus 231 0, shown in Fig. 23, between memory 550a and the interface circuitry. 
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TABLE 5 



Burst size: 


8 


16 


32 


64 


128 


256 


Total cycles: 


45 


61 


93 


157 


285 


541 


M=8 


18 


26 


34 


41 


45 


47 


M=16 


36 


52 


69 


82 


90 


95 


M=32 


71 


105 


138 


163 


180 


189 



[0141] Tables 4 and 5 demonstrate that if a 50 MHz clock rate is used by the PPP 560, shown in Fig. 6, M should 
be sixteen and B should be sixteen or thirty two depending on the access speed and latency requirements of the CPU 
access cycle CAC. Tables 4 and 5 also indicate that if a 100MHz clock rate is desired, M should be thirty two. 

75 [0142] Alternately, memory 550a may be a Rambus™ DRAM (RDRAM) which is a DRAM memory with a special 
memory interface that allows the transfer of data bursts at up to 500 MHz. The Rambus™ based memory devices have 
an eight bit data transfer bus, and use the same bus for addressing. Internally to the memory device there are two mem- 
ory banks, each with a cache to provide quick access response to requests for data transfers. If the data requested from 
the memory device is in the memory device cache, the overhead to start a burst transfer is 40 ns. If the data is not in 

20 cache, then the overhead is in the range of 84 to 156 ns. Table 6 summarizes the effect of the overhead as a function 
of burst data length for a single port access cycle. Rambus supports data bursts of 8 to 256 data pixels values. 



TABLE 6 



Burst length: 


setup: 


8 


16 


32 


64 


128 


256 


cache hit 


40 ns 


7.0 


4.5 


3.3 


2.6 


2.3 


2.2 


no cache hit min 


84 ns 


12.5 


7.3 


4.6 


3.3 


2.7 


2.3 


no cash hit max 


156 ns 


21.5 


11.8 


6.9 


4.4 


3.2 


2.6 



[0143] Table 7 below demonstrates the effective data transfer rate possible for single port access as a function of 
the burst length. In Table 7, for example, if the data to be retrieved is in cache and the burst length is 16, the transfer 
35 rate of data from memory 550a to the frame store controller is 222MHz. 

TABLE 7 



Burst length: 8 16 32 64 128 256 

cache hit (MHz) I 143 222 308 381 432 464 

no dash hit min (MHz) 80 138 216 302 376 430 

no cash hit max (MHz) [47 85 145 225 311 383 



45 



[0144] Table 8 below demonstrates the effective transfer rate possible for dual port access: one write cycle VWC 
and one video read cycle VRC, for triple port access: one write cycle VWC, one read cycle VRC, and CPU access cycle 
CAC, and five port access: two write cycles VWC, two read cycles VRC, and CPU access cycle CAC. The estimated 
so transfer rates assume that the write cycles VWC do not have a memory size problem and the read cycles VRC and the 
CPU access cycle CAC have either a minimum or a maximum cache access overhead. 
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TABLE 8 



Burst length: 




8 


16 


32 


64 


128 


256 


Dual Port min 


MHz 


51 


85 


127 


168 


201 


223 


Dual Port max 


MHz 


35 


62 


99 


142 


181 


210 


Triple port min 


MHz 


31 


53 


80 


108 


131 


147 


Triple port max 


MHz 


20 


36 


59 


87 


114 


136 


Five port min 


MHz 


19 


33 


49 


66 


79 


89 


Five port max 


MHz 


14 


25 


42 


63 


84 


100 



75 [0145] Table 8 demonstrates that with triple port access and a 50MHz access rate, the burst length should be 32 
pixel values. For a 100MHz access rate the burst length should be 128 pixel values. If the CPU access is given lower 
priority it is possible to reduce the burst lengths. The burst lengths are fully programmable via processor 540, shown in 
Fig. 5. , ' 

[0146] If a large memory bandwidth is desired, other write access terminals 2320 shown in Fig. 23, and other read 
20 access terminals 2330, shown in Fig. 23, may be added to the frame store controller. This would create a frame store 
memory having more than three access terminals. The number of access terminals is limited only by the bandwidth of 
the memory. Access terminals may be added in balanced read-write pairs or more access terminals (read or write) of 
one type may be added than of the other type. Assuming they are supported by a high-bandwidth memory, the addition 
of more access terminals is functionally equivalent to implementing multiple separate frame store controllers. For exam- 
25 pie, a system with a 50 MHz video clock rate, using burst lengths of 64 pixels or 32 pixels if CPU access is given lower 
priority, may be implemented using a frame store controller 605 having five access terminals. 

[0147] An alternative method for providing multiple access terminals to an external memory is shown in Figs. 25 
through 28. Briefly, this method uses a plurality of buffer memories coupled to a memory controller to provide a respec- 
tive plurality of memory ports. This embodiment of the invention utilizes a feature of DRAM design which allows for very 

30 fast burst-mode data transfers. Using this feature, the memory can fill or empty several buffers quickly while the buffers 
provide data to be processed, or receive processed data at a lower rate. Fig. 25 is a block diagram of a pipelined parallel 
processor (PPP) 2500 which uses multiple port buffers 2512 coupled to a multi-port frame store controller 2510. The 
controller coordinates requests for data between the memory 550 and the multiple port buffers 2512. In the exemplary 
embodiment of the invention, one of the port buffers is configured to receive input digital video data provided by a digi- 

35 tizer 2514 and another port buffer is configured to provide output digital video data to a video encoder 2516. 

[0148] The PPP shown in Fig. 25 employs many of the same circuits described above with reference to Fig. 6. 
These include a pyramid filter 2515, an image warp processor 2525, an image motion estimation processor 2535 a 
look-up-table (LUT) 2545 and a non-blocking crossbar switch 2540. As described above, the operation of the PPP is 
controlled by a control processor 2560 through a control interface 2550. The exemplary control interface allows control 

40 registers in the various components of the PPP to be set and interrogated. 

[0149] In summary, individual port buffers of the multi-port frame store memory are assigned to provide data to, or 
receive data from respective processing elements of the PPP 2500 and, thus, to receive data from and provide data to 
the memory 550. Each port buffer, except for the ones coupled to the digitizer 2514 and encoder 251 6, is bi-directional, 
that is to say, it may either provide data to, or receive data from the memory 550 and the processing elements of the 

45 PPP 2500. The configuration of a particular buffer is determined by register values that are associated with the buffer. 
These include HZERO, VZERO, HSIZE, VSIZE, H START, V START and SYNC_SOURCE. These registers have the 
same function for the buffers 2512 as for the READ FIFO 2350 and WRITE FIFO 2345 described above with reference 
to Fig. 23. When the port buffer shown in Fig. 25 is configured to store data into the memory 550, these registers act 
as the corresponding registers for the WRITE FIFO 2345, when the buffer is configured to fetch data from memory 550, 

so these registers act as the corresponding registers for the READ FIFO 2350. The PPP may be programmed to assign 
specific port buffers to specific processing units or the buffers may be dynamically assigned by the CPU 2560 on an as- 
needed basis. 

[0150] Figure 26 is a block diagram which shows details of the multi-port buffer 2512 and the frame store controller 
251 0. As shown in Figure 26, the multi-port buffer element 2512 includes a several buffer memories 261 0 each of which 
55 is coupled to write control circuitry 2612 and read control circuitry 2614. Each buffer memory 2610 is coupled to receive 
data from, and provide data to the crosspoint switch 2540. In addition, each memory 2610 is coupled to receive data 
from or provide data to the memory 550 via the multi-port memory controller 251 0. The connection between each of the 
buffer memories 261 0 and the crosspoint switch 2540 may be via a single bus (not shown) to which the buffer memory 
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261 0 is coupled via pairs of three-state gates (not shown). These three state gates may be configured either to provide 
data to the bus from the crosspoint switch or to provide data to the crosspoint switch from the bus. The interface 
between the buffer memory 261 0 and the multi-port frame store controller may also use pairs of three-state gates (not 
shown) to connect the buffer to provide data to, or receive data from the memory 550. In the exemplary embodiment of 
5 the invention, the pairs of three-state gates between the port buffers 2610 and the memory 550 form a multiplexer- 
demultiplexer pair, shown in Figure 26 as the multiplexer 2620 and demultiplexer 2630. These three-state gates are 
controlled by ap arbitration circuit 2640, as described below. 

[0151] . In the exemplary embodiment of the invention, each of the buffer memories 2610 includes a sufficient 
number of memory cells to hold one horizontal line of video data of an input image. It is contemplated, however, that 

10 smaller buffer memories may be used when the bandwidth of the memory 550 is sufficient to keep the buffers full. In 
addition, it is contemplated that the buffer memories 2610 may hold more than one horizontal line of image data. Each 
of the buffers may be, for example, a bi-directional FIFO memory element which receives data at one rate and provides 
it at another data rate. In the exemplary embodiment of the invention the rate at which data is exchanged between the 
buffer memories 2610 and the memory 550 is greater than the rate of. data exchange between the buffers 2610 and the 

75 crosspoint switch 2540. 

[01 52] The exemplary multi-port buffer structure 251 2 includes eight buffer memories! It is contemplated, however, 
that the buffer structure 251 2 may include more or fewer memories. The number of buffer memories used in the buffer 
structure 2512 is limited only by the relative bandwidths of the memory 550 on the one hand and of the crosspoint 
switch 2540 and/or the individual processes coupled to the crosspoint switch 2540 on the other hand. 

20 [0153] When the CPU 2560 configures a buffer memory 2610 as an output buffer, the memory 2610 is controlled 
by its respective write control circuitry 2612. When the memory buffer 261 0 is configured as an input buffer, it is control- 
led by its respective read control circuitry 2614. The control registers for the buffer are shared between the read control 
circuitry and the write control circuitry. In the exemplary embodiment of the invention, each buffer may be assigned to 
act only as an input buffer or as an output buffer in a given process configuration. It is contemplated, however, that a 

25 single buffer may be configured to operate as both an input buffer and an output buffer if the data rate of the process 
supports this use. In this configuration, however, it may be desirable to implement separate control register sets for each 
of the write control circuits 2612 and read control circuits 2614. 

[0154] As described above, the multi-port memory controller 2510 includes a multiplexer 2620 which selectively 
provides data from one of the buffer memories 2610 or from the CPU cache memory 2550 to the memory 550. The 

30 memory controller 251 0 also includes a demultiplexer 2630 which provides data from the memory 550 to a selected one 
of the buffer memories 2610 or to the CPU cache memory 2550. The multiplexer 2620 and the demultiplexer 2630 are 
controlled by an arbitration circuit 2640. The arbitration circuit monitors each of the buffers 2610 both for data store and 
data fetch requests. The arbitration circuit also monitors the memory 550 for intervals at which requested data is being 
provided by the memory 550 and for intervals at which data to be stored may be transferred to the memory 550. 

35 [0155] In the exemplary embodiment of the invention, the arbitration circuit 2640 implements a round-robin sched- 
uling protocol overlaid with a simple priority protocol. According to the combined protocol, each buffer 2610 is assigned 
a priority by the CPU 2560. When the arbitration circuit determines that a memory read request or a memory write 
request may be handled, it will select the next buffer in the cycle unless there is an outstanding request from a buffer 
having a higher priority value. 

40 [0156] Fig. 27 is a block diagram which illustrates how a simple filtering operation may be performed on the PPP 
2500 using the memory structure shown in Fig. 26. To simplify the explanation, Fig. 27 shows only some of the compo- 
nents of the PPP 2500, in particular, it shows only the pyramid filter 2515, the crosspoint switch 2540, the buffer struc- 
ture 2512 including only three buffers 2610 and the multi-port frame store controller 2510. The pyramid filter 2515 is 
shown as its component parts, including delay lines 2710, an edge controller 2712, a convolver 2714, a multiplexer 

45 271 6 and an arithmetic and logic unit (ALU) 271 8. The function and operation of these components is described in the 
above-referenced U.S. Patent no. 5,359,674 to van der Wal. 

[0157] In the system shown in Fig. 27, one of the buffers, 261 OA is assigned by the CPU 2560 (not shown in Fig. 
27) to provide data to the filter 2515 from the memory 550 while the other two buffers 261 0B and 2610C are assigned 
by the CPU to receive data from the filter 2515 and provide that data to the memory 550. In the example shown in Fig. 
so 27, the filter 2515 is receive an image from the memory 550 at one level of a pyramid representation of an image and 
to derive Gaussian and Laplacian images for the next lower resolution pyramid level. 

[0158] In the example configuration shown in Fig. 27, the buffer memory 261 OA is assigned to fetch consecutive 
lines of the input image from a predetermined image area in the memory 550. When the buffer 261 OA is full, it estab- 
lishes a predetermined connection with the delay lines 2710 through the crosspoint switch 2540. The delay lines 2710 
55 fetch the data from the buffer 261 OA until the buffer is empty, at which time, the buffer requests the next image line from 
the predetermined image area. When the delay lines 2710 have stored five image lines, data from these lines is pro- 
vided to the edge control circuitry, which may, for example, implement a pixel mirroring of the edge pixels as they are 
applied to the convolver 2714 to ensure that the entire image is processed consistently. The convolver is a separable 
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vertical and horizontal filter which first combines samples from the five vertical image lines provided by the delay lines 
271 0 to form a single vertically filtered line and then combines samples in the resulting horizontal line to form a horizon- 
tally and vertically filtered horizontal line of samples. In the exemplary embodiment of the invention, the convolver 2714 
implements a spatial low-pass filter. Thus, the output signal of the convolver 271 4 is Gaussian filtered image data. The 
5 multiplexer 2716 selects a central one of the lines of input samples provided by the edge control circuitry 2712 and 
applies it to one input port of the ALU 2718. The other input port of the ALU is the low-pass filtered image line from the 
convolver 2714. The ALU subtracts the Gaussian image data from the input image data to produce Laplacian filtered 
image data. 

[0159] The filter 2515 provides the Gaussian image data to buffer memory 261 OB while the ALU 2718 provides the 
w Laplacian image data to the buffer memory 261 OC. Each of these buffer memories is assigned an image location in the 
memory 550. As set forth above, the image data for the Gaussian and Laplacian images may be decimated as it is 
stored into the buffers 261 OB and 261 OC. The buffer memories 261 OB and 261 OC store data into the memory 550 as 
soon as the buffers become full. 

[0160] In the exemplary embodiment of the invention, only active image data is stored into the memory 550. The 
15 signals horizontal active (HA) and vertical active (VA) are asserted, for example, by the buffer memory 261 OA to the 
processing circuitry of the PPP 2510 when the buffer is full. These signals indicate to the delay lines 2710 that the data 
may be accessed. When a buffer is empty, these signals are no longer asserted and the delay lines 271 0 stop transfer- 
ring data from the buffer 261 OA. In the same way, when the convolver 271 4 and ALU 271 8 provide data to the respective 
buffer memories 261 OB and 261 OC, they provide the data with the signals HA and VA asserted. At the end of a line of 
20 data, these signals are no longer asserted and the respective buffers signal the frame store controller that their lines of 
data are ready to be stored into the memory 550. 

[0161] Fig. 28 is a timing diagram which is useful for describing the parallelism that is achieved in the circuitry 
shown in Fig. 25. This timing diagram illustrates essentially the same image alignment process as is shown in Fig. 8. 
Each of the horizontal lines in Fig. 28 illustrates a parallel process that is being performed in the PPP 2500. 

25 [0162] Line 2810 represents the activity of the input buffer 2610 that is coupled to receive the input image from the 
digitizer 2514. Image data is continually being provided to the buffer and it is being continually stored in the designated 
image location of the memory 550 over the 16.6 msec, image field interval. Once an image field has been stored into 
memory 550, it is accessed, as shown by line 2812, to decompose the image into a five level pyramid (levels 0 to 4). At 
the end of the time interval represented by line 2812, the input image and four Gaussian images and four Laplacian 

30 images have been stored in respectively different image areas of the memory 550. Also-stored in the memory 550 are 
an aligned current image and a pyramid decomposition of the aligned current image including four Gaussian images 
and four Laplacian images. The current image and its decomposition are in the memory as a result of an alignment 
operation previously performed on the current image when it was the input image. 

[0163] The next line in the process, line 2814 represents the correlation operation that is performed between the 

35 level 4 Laplacian pyramid of the input image and the level for Laplacian pyramid of the current image. This operation is 
allotted more time than later operations as it is the initial coarse alignment operation between the input image and the 
current image.. After this initial coarse alignment has been performed, the next step in the process, represented by line 
281 6, conforms the level 4 Laplacian pyramid of the input image to the level 4 Laplacian image of the current image by 
successively warping (W) the Laplacian of the input image, estimating any remaining motion (ME) between the input 

40 image and the current image by calculating an image transform, as described above, and then solving the image matrix 
(MS) to obtain the transformation parameters. Each of these operations is performed four times in the interval repre- 
sented by line 2816 as illustrated by the blown-up segments 2830, 2832, 3834 and 2836, shown in Fig. 28. At each 
level, the transformation parameters are retained. These parameters may, for example, be retrieved by the CPU 2560 
and stored into a data storage area of the memory 550 using the unbuffered memory ports 2621 and 2631 (shown in 

45 Fig. 26) between the CPU cache memory 2550 (shown in Fig. 25) and the memory 550. 

[0164] At the process steps represented by the lines 281 8 and 2820, the same steps are performed for the level 3 
and level 2 Laplacian pyramids. At each of these steps, the first warp operation warps the input Laplacian image 
according to the total transformation determined for the previous Laplacian image level. At the end of the time interval 
represented by line 2820, a set of image transform parameters have been determined which warp the input image to 

so the current image to the accuracy of the level 2 pyramid. The inventors have determined that this is sufficiently accurate 
for an image stabilization system. 

[0165] In the intervals represented by lines 2822 and 2824, the input luminance and chrominance components are 
warped according to the transform parameters determined at the steps represented by the lines 2812, 2814, 2816, 
2818 and 2820 to produce an aligned input image. This input image then becomes the current image. Finally, in the 
55 interval represented by line 2826, the aligned input image is provided to the video encoder 251 6 (shown in Fig. 25). 
[0166].^ As shown in Figure 28, four processes are occurring in parallel on four different images. Line 2826 repre- 
sents image 0 being provided to the encoder 2516 while lines 2822 and 2824 represent image 1 being warped; lines 
281 2, 281 4, 281 6, 281 8 and 2820 represent image 2 being aligned to image 1 ; and line 281 0 represents image 3 being 
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received and stored into the memory 550. 

[0167] Although illustrated and described herein with reference to certain illustrative embodiments, the present 
invention is nevertheless not intended to be limited to the details shown. Rather, various modifications may be made in 
the details without departing from scope and spirit of the invention. 

Claims 



1. Image processing apparatus comprising: 



10 a memory; 

an integrated circuit connected to the memory, the integrated circuit including: 

a function circuit which performs a signal processing function on at least one of the plurality of image sig- 
nals, 

75 a filter which applies a filtering function to at least one of the plurality of image signals, and 

a frame store controller having more than five memory ports, the frame store controller including: 

means for concurrently storing at least two image signals in the memory, and 
means for concurrently reading at least two image signals from the memory, 

a crosspoint switch coupled to the function circuit, the filter means and the frame store controller for con- 
currently routing at least two image signals to the filter, the frame store controller, and the function circuit, 
and 

a plurality of memory port buffers each memory port buffer including a first-in -first-out (FIFO) memory ele- 
ment, wherein the plurality of memory port buffers is coupled to the crosspoint switch and to the frame 
store controller. 



20 



25 



2. Apparatus according to claim 1 wherein at least one of the memory port buffers is programmable to receive data 
from or provide data to the crosspoint switch and to provide data to or receive data from the memory, respectively. 

30 

3. Apparatus according to claim 2 wherein the integrated circuit further comprises a controller that provides program- 
ming signals to configure the filtering means, the frame store controller, and the processing means to perform a 
series of predefined operations to produce an output signal. 



35 4. Apparatus according to claim 2 wherein each of the memory port buffers includes a plurality of programmable reg- 
isters that identify an area of the memory for data transfer between the memory and the memory port buffer. 

5. Apparatus according to claim 2 further including an arbitration circuit which selectively establishes connections 
between the memory and selected ones of the memory port buffers. 

40 

6. Apparatus according to claim 5 wherein the arbitration circuit further includes a scheduling processor that com- 
bines a round-robin scheduler and a priority scheme for determining which memory port buffer of the plurality of 
memory port buffers is given access to the memory. 



45 7. Apparatus for processing image data arranged in frames of image lines, the apparatus comprising: 



a memory; 

an integrated circuit connected to the memory, the integrated circuit including: 



so an input port for receiving an input image signal including a first image frame and a second image frame; 

a crosspoint switch; 

a multi-port frame store controller including: 



a plurality of programmable bi-directional port buffers, each port buffer being coupled to the crosspoint 
55 switch and to the memory, and including a first- in -first -out memory; 

an input port buffer configured to receive the image data signal provided at the input port and to store 
s the received image data signal into the memory; 

an arbitration circuit which selectively establishes connections among the memory, the plurality of bi- 
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directional port buffers and the input port buffer; 

a first image processor having at least one input port coupled to the crosspoint switch to receive image 
, data from at least a respective first one of the bi-directional port buffers and having at feast one output port 
5 ' coupled to the crosspoint switch to provide data to at least a respective second one of the bi-directional 

port buffers; 

,a second image processor having at least one input port coupled to the crosspoint switch to receive image 
• data from at least a respective third one of the bi-directional port buffers and having at least one output port 
coupled to the cross point switch to provide data to at least a respective fourth one of the bi-directional port 
w buffers; and 

a controller coupled to the integrated circuit to program at least the first and third bi-directional port buffers to 
fetch data from respectively different areas of the memory, to concurrently provide the fetched data to the 
respective first and second image processors via the crosspoint switch, to program at least the second and 
75 fourth bi-directional port buffers to concurrently receive data from the respective first and second image proc- 

essors, and to store the received data into respectively different areas of the memory. 

81 Apparatus according to claim 7, wherein each of the^bi-directional port buffers includes a number of memory cells 
sufficient to hold one line of the input image signal. 

20 

9. Apparatus according to claim 7 wherein each of the bi-directional port buffers includes a plurality of programmable 
registers that identify the respective area of the memory for data transfer between the memory and the bi-direc- 
tional port buffer. 

25 10. Image processing apparatus comprising: 

a memory; 

an integrated circuit connected to the memory, the integrated circuit including: 

30 

~ a function circuit which performs a signal processing function on at least one of the plurality of image sig- 
nals, and 

a crosspoint switch coupled to the function circuit, the filter means and the frame store controller for con- 
currently routing at least one image signal to the function circuit; 
35 a frame store controller including: 

a plurality of programmable bi-directional first -in-first- out buffer memories coupled to the crosspoint 
switch and to the memory; 

a plurality of programmable registers that identify a respective area of the memory for data transfer 
40 between the memory and a respective one of the plurality of programmable bi-directional port buffers 

and a direction for that transfer; and 

an arbitration circuit which selectively establishes connections between the memory and a selected 
one of the plurality of programmable bi-directional port buffers; 

45 a controller coupled to the integrated circuit to program the crosspoint switch and the plurality of bi-direc- 

tional port buffers to fetch data from at least one predetermined area of the memory, to provide data to at 
least one further predetermined area of the memory, to route the data fetched from the frame store con- 
troller to the function unit and to route the data provided by the function unit to the frame store controller. 
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